cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium
https://cumc.github.io/xqtl-protocol/
MIT License
38 stars 42 forks source link

twas imputability/variant selection for ctwas #991

Closed Chunmingl closed 4 months ago

review-notebook-app[bot] commented 4 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

gaow commented 4 months ago

@Chunmingl i'm a bit hesitated about using SQLite as intermediate files for manipulating the summary stats because in my experience, this format is easily corrupted and not very robust. However I"m open to trying it in this context and see if it works better. We don't have a large data-set for the gene-level information anyways so maybe either way it is going to work out.

Chunmingl commented 4 months ago

@gaow I am thinking maybe we can have summary info generated for each gene then save it as small size gene-summary file, then if there were re-processed genes for the same data type, (same summary file name) this gene-summary file will get overwritten with the updated information, and we can run another script to summarize all the summary file in a directory into a single meta file.

I thought of SQL format as I was thinking of if the pipeline constantly updating on one meta-file and then it will be an issue when multiple process try to read-write at the same time.

Chunmingl commented 4 months ago

May I know more about the export script idea that you have mentioned in the meeting?