ArnovanHilten / GenNet

Framework for Interpretable Neural Networks
Apache License 2.0
91 stars 14 forks source link

Problems with converting smaller dataset #66

Closed KasperFischer closed 2 years ago

KasperFischer commented 2 years ago

Hi Arno

Thank you for your previous help. That was the cause of large advances in my project! I am now trying out some model specifications on a smaller dataset with 272 variants in plink format.

I use the same command to convert the plink data with my full dataset (in which case it worked):

python GenNet.py convert \
  -g /home/path/to/small/plink/file/ \
  -study_name clumped \
  -step all

I am now getting a new error message. I have checked with the larger dataset, and here the above command works fine. The error mesage I am getting is:

using /home/usr/me/GenNet/processed_data//
Number of Individuals: 6283
Number of Probes 272 in clumped.bim
Number of Probes 272 converted
next 272 SNPs, from 272, need to convert 0
Time to read 272 SNPs is 0.37458252906799316 s
Time to write 272 SNPs is 0.010164976119995117 s
Number of individuals 6283 
  family individual  paternal  maternal  sex  label
0      1   dfg21         0         0    2     -9
1      2   dfg22         0         0    1     -9
2      3   dfg23         0         0    1     -9
3      4   dfg24         0         0    1     -9
4      5   dfg25         0         0    1     -9
Converted number of variants 272
   CHR          ID  distance         bp              allele1              allele2
0    1  exm9151        0    5113911 -5294711266661764450 -8677785269953517192
1    1    exm9152          0    5674434 -6619351268469618139 -8677785269953517192
2    1   exm9153          0    5782769 -6619351268469618139 -8677785269953517192
3    1   exm9154          0    6095199 -5294711266661764450 -2043262203756013014
4    1  exm9155         0   14842486 -2043262203756013014 -5294711266661764450
5    1  exm9156          0   18807755 -5294711266661764450 -2043262203756013014
6    1  exm9157          0   23168961 -8677785269953517192 -6619351268469618139
7    1    exm9158          0   25434242 -6619351268469618139 -8677785269953517192
8    1  exm9159          0   81947930 -8677785269953517192 -5294711266661764450
9    1  exm91510          0  100818728 -2043262203756013014 -6619351268469618139
Number of variants in genotype folder 272
[[x x x ... x x x]
 [x x x ... x x x]
 [x x x ... x x x]
 ...
 [x x x ... x x x]
 [x x x ... x x x]
 [x x x ... x x x]] # in reality there are numbers (0, 1, 2) here ,but I have changed them to 'x'
Time to convert all data: 6.648498058319092 sec
Traceback (most recent call last):
  File "GenNet.py", line 158, in <module>
    main(args)
  File "GenNet.py", line 26, in main
    convert(args)
  File "/home/usr/me/GenNet/GenNet_utils/Convert.py", line 414, in convert
    merge_hdf5_hase(args)
  File "/home/usr/me/GenNet/GenNet_utils/Convert.py", line 61, in merge_hdf5_hase
    g = h5py.File(filepath_hase.format(1), 'r')['genotype']
  File "/home/me/env_GenNet/lib/python3.8/site-packages/h5py/_hl/files.py", line 406, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/home/me/env_GenNet/lib/python3.8/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = '/home/users/me/GenNet/processed_data///genotype/1_clumped.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Do you know how I can convert a reduced dataset to a h5-format and work further with your tool?

ArnovanHilten commented 2 years ago

Hi Kasper,

I"m glad you advanced in your project!

The errror seems to indicate that there needs to be this file: '/home/users/me/GenNet/processed_data/genotype/1_clumped.h5'

What files do you have in the genotype folder (/home/users/me/GenNet/processed_data/genotype/)?

Maybe it is also good to check if there are not other older conversion files from your previous conversion there. Maybe that could cause some confusion.

Best,

Arno

ArnovanHilten commented 2 years ago

Dear Kasper

I found the error https://github.com/ArnovanHilten/GenNet/blob/19ac2d061d2828f083947f4738679a6d9d40b23f/GenNet_utils/Convert.py#L61

Line should be:

g = h5py.File(filepath_hase.format(0), 'r')['genotype']

I fixed it in the dev branch https://github.com/ArnovanHilten/GenNet/blob/c3a30498707b152ec9c09a2ed1f8102d0ee5e2f0/GenNet_utils/Convert.py#L61

So the 1 should be a 0. You can simply change .format(1) to .format(0) on line 61 in GenNet_utils/Convert.py and it should work.

Thanks for spotting this bug.

Best,

Arno