AlexandreRozier / DeepCombi

A repository for the DeepCombi project
9 stars 3 forks source link

OSError #1

Open kmarianski opened 2 years ago

kmarianski commented 2 years ago

Hi, running the first step to generate synthetic genotypes ROOT_DIR=$PWD SGE_TASK_ID=1 python -m pytest -s tests/test_data_generation.py::TestDataGeneration::test_synthetic_genotypes_generation --rep 1000 results in error:

OSError: Unable to open file (unable to open file: name = 'scratch/private/DeepCombi/data/WTCCC/CD/chromo_2.mat', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Could you please include the 'data' folder if possible?

Also, this might have to go to a separate issue, but would you be able to provide an end-to-end deepCOMBI implementation for a real world, open source data, using bim, bed, fam and pheno/sample files?

Thanks! P.S. great work, hope many geneticists will start using your model.

bettinanana commented 2 years ago

Please note, that to generate datasets you need two real datasets to sample from. We use the WTCCC data and randomly select 300 subjects of the Crohn's disease dataset. Unfortunately, we are not authorized to publish this data and you will have to save your own datasets in the corresponding .mat files. The .mat files should be simple arrays of characters where the number of rows equals the number of subjects and the number of columns equals the number of SNPs * 3 (two letters for the genotype and one space). A small part of it with three subjects and the genotypes of four SNPs given would look like this:

AA AA CG GG AT AA GG GG TT AT CC GT