cnr-ibba / SMARTER-database

Smarter database repository
https://smarter-database.readthedocs.io/en/latest/
MIT License
1 stars 0 forks source link

:zap: enhance genotype conversion #114

Open bunop opened 9 months ago

bunop commented 9 months ago

Is your feature request related to a problem? Please describe. SNP genotype conversion is very slow. For example, if the original file is already in plink binary format it requires too much time to convert it by creating temporary plink text files. Converting the whole dataset into FORWARD (see #111) generates huge temporary files. It requires time converting from binary to text since the information for the same sample is column based in binary and row based in text formats.

Describe the solution you'd like Ideally data conversion need to be done with binary formats (not text). If it's possible to change only the alleles of the .bim files for plink binary files it needs to be done.

Describe alternatives you've considered We working with text until now and it's works. However, is very inefficient.

Additional context