CoBrALab / UKBB-tabular-processing

Scripts to handle the tabular data associated with the UK BioBank
8 stars 1 forks source link

`AttributeError: 'ExprArrayNameSpace' object has no attribute 'lengths'` during instance replication step #19

Closed HoumanAzizi closed 1 year ago

HoumanAzizi commented 1 year ago

Hi,

When I run the last step of the code using the following line, I get an AttributeError:

I have created a virtual environment with python 3.9.6 and the packages in requirements.txt. All the necessary documents are in the same folder and I have attached my input .yaml file here (as zip). config_2023Jul5.yaml.zip

When I run the code, I receive the following error:

$ python melted_UKBB_extract.py --config-file config_2023Jul5.yaml --data-file current.melt.arrow --output-prefix mysubset_
<CLIP>
`Traceback (most recent call last):
 File "/lustre04/scratch/houmanaz/data_extraction_gabe/melted_UKBB_extract.py", line 382, in <module>
 data, data_wide, dictionary, codings = extract_UKBB_tabular_data(
 File "/lustre04/scratch/houmanaz/data_extraction_gabe/melted_UKBB_extract.py", line 142, in extract_UKBB_tabular_data
 pl.when(pl.col("InstanceID").arr.lengths() > 1)
AttributeError: 'ExprArrayNameSpace' object has no attribute 'lengths'

Thank you

gdevenyi commented 1 year ago

Please provide the entirety of the output (from the command run, until the command stops), and the versions of packages used.

gdevenyi commented 1 year ago

Also please confirm you're using the latest version of the code git pull

gdevenyi commented 1 year ago

Origin of bug found, you are likely using polars > 0.18.0 https://github.com/pola-rs/polars/pull/8999

gdevenyi commented 1 year ago

Fixed in 8e98f7c4e299b81e125117b2df2ec920fb5b10ca

gdevenyi commented 1 year ago

Note that I ran the code with your config, and run out of memory at 189GB during the pivoting stage. Are you really sure you want all InstanceIDs? Don't you only want subjects with imaging data? Shouldn't you also be filtering using the file:subjectIDlist feature I added for only QC-passing subjects?

HoumanAzizi commented 1 year ago

Thank you very much for the help. I used the updated version and the code started running. As you mentioned, I would only need the imaging visits so I limited the InstanceIDs to 2 and 3 and was able to successfully run the code on ComputeCanada.

gdevenyi commented 1 year ago

Great, however InstanceID goes 0-3 and I think 1-2 are the imaging timepoints? Check this :)

HoumanAzizi commented 1 year ago

I just double checked and instance 0 and 1 are for initial and repeat assesment visits while instance 2 and 3 are first and repeat imaging visits, respectively. https://biobank.ctsu.ox.ac.uk/crystal/instance.cgi?id=2 Thank you!

gdevenyi commented 1 year ago

Great!