KarchinLab / open-cravat-modules-karchinlab

MIT License
0 stars 6 forks source link

GDS File Converter Module #8

Closed The-Jacob-Lopez closed 2 years ago

The-Jacob-Lopez commented 2 years ago

Motivation The Genomic Data Structure (GDS) is a space efficient file format for storing variant information with many of the same benefits as VCF. These code changes implement a GDS converter module.

Changes Changes to converters The converters folder in open-cravat-modules-karchinlab now contains an additional folder. Called gds-converter which contains the gds converter module along with other expected files. The gds converter outputs variant information in the expected format. The general structure of the gds converter is as such. The python script will generate an R subprocess using the python rpy2 library. From there, the input GDS file will be passed to the R subprocess. The R subprocess then uses a R library equipped to efficiently read GDS file. The library is called SeqArray. Relevant variant information is then access and parsed and returned to the main python script. The python script then reformats the variant information into an expected formats and yields the result.

Relevance Use of R, rpy2, and SeqArray The use of these additional resources was a result of certain technical difficulties including: