add geno align and pvalue ordering and remove fsusie sumstats in export

rfeng2023 commented 7 months ago

tested on e/s/p/ha/m/meta QTL with demo data. in this version

all variant names would align to the aligned big geno bim file. and * (-1) in effect size if there is a flip happened.

fsusie do not have sum stats results so I skipped related part in fsusie export.
add ordering and pick top 1 in the isoform cases. While in sQTL case, I picked top1 lf (don't know if that is productive or unproductive ) and top1 PSI and top1 UNPRODUCTIVE lf2 from each context.
R do not allow save list as parquet format, need to convert to table first, while that would cost a lot of time and damage our export data structure. https://arrow.apache.org/docs/r/reference/write_parquet.html. What about test hd5 format?
modified cs update function, if the pip of all variants of a cs is < 0.05 but passes the minimum correlation critera, the cs is not removed in this version.
need 2 more inputs in analysis: aligned gene bim file (for 1 ) and context meta (for 2)

Haven't finished for:

gwas exportation modification (will check new results)
overlapping part export (remove flipping check)

review-notebook-app[bot] commented 7 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

rfeng2023 commented 7 months ago

Hi @gaow , another concern for current export is we can not find corresponding original file (one gene could have multiple original files) through condition_top_loci column in meta file, which only have context name, that means, if users need to load original data, they would be confused to load which one if they are not quite familiar with data. Especially with the case that one eQTL context was stored in one file named as pQTL. One solution is that we can add original file prefix to each context in that meta data table, but I am a little bit hesitate about that as I don't want that meta looks so messy. especially we can have a lot of contexts in that file. Another solution is ask users load context meta, and extract similar string in that, but that is still not convenient. While I think you mentioned that we will export all susie data to one file. In that case this might not be a problem, people may not need to use original files anymore.....

cumc / xqtl-protocol

add geno align and pvalue ordering and remove fsusie sumstats in export #963