Data Export - Githubissues

chrtannus commented 1 month ago

Add these files to the download zip, under the data directory:

[ ] The iRegulon results, as a .txt file (TSV format)[^1]
[ ] The query parameters, including the "hidden" default ones (e.g. query gene names, organism, ranking parameters, etc), as a .txt file (TSV format, parameter-name[\t]value)[^2] -- see #2 and #3
[ ] An IRF.gz file that can be used to load the results in the Cytoscape app--see the Load and Save buttons in the Cytoscape app's Results Panel:

[^1]: The format of the iRegulon results is TSV, but it includes gene names in the same "columns", where each gene is separated by ;. I don't know if we should just export the raw results (see attached file) or denormalize these list values (e.g. target, rank) in separate TSV files. I'm not sure which is more convenient for the user, but the raw iRegulon format is probably the better choice. [^2]: Just like in the iRegulon results, the query genes must be separated by ;.

mikekucera commented 2 days ago

I was thinking of rebuilding the iregulon results from the stored json. But its much easier to just store the entire results as a string in mongo, then return it as-is when exporting. This uses more storage space, but I honestly don't think it will be a problem. We could always delete them manually if it ever came to it.

mikekucera commented 2 days ago

For the query parameters, is that just whats sent to the service here? https://github.com/cytoscape/iregulon-webapp/blob/main/src/server/routes/api/create.js#L35

Are they ever sent back from the service? It seems they will have to be stored in memory until the service job is complete, or fails or times out?

chrtannus commented 2 days ago

Saving the raw (string) iRegulon results in mongo sounds like a good idea.

Yes, those should be all the parameters for the main iRegulon query. But remember that the Metatargetome Query will have different parameters -- see #3

The parameters are not sent back from the iRegulon service, so they do have to be saved in memory until the job is complete.

mikekucera commented 17 hours ago

Christian, the end of the IRF file that was output from the iRegulon app looks like this... I don't know what these parameters mean or where they come from. The app needs them or else the file can't be imported. Do you have any idea where this data comes from or if its safe to hard-code any of it?

<eScore>3.0</eScore>
<thresholdForVisualisation>5000</thresholdForVisualisation>
<rocThresholdAUC>0.03</rocThresholdAUC>
<speciesNomenclature>
  <code>1</code>
</speciesNomenclature>
<iRegulonType>PREDICTED_REGULATORS</iRegulonType>
<name>hypoxia_geneset.txt</name>
<motifCollection>10K (9713 PWMs)</motifCollection>
<trackCollection>1120 ChIP-seq tracks (ENCODE raw signals)</trackCollection>
<minOrthologous>0.0</minOrthologous>
<maxMotifSimilarityFDR>0.001</maxMotifSimilarityFDR>
<isRegionBased>false</isRegionBased>
<motifRankingsDatabase>
  <code>hg19_tss_centered_10kb_7sp_mc_v6</code>
  <name>20kb centered around TSS (7 species)</name>
  <delineationDefault>
    <code></code>
    <name></name>
  </delineationDefault>
  <NESvalue>3.0</NESvalue>
  <AUCvalue>0.03</AUCvalue>
  <visualisationValue>5000</visualisationValue>
</motifRankingsDatabase>
<trackRankingsDatabase>
  <code>hg19_tss_centered_10kb_chip_v1</code>
  <name>20kb centered around TSS (ChIP-seq-derived)</name>
  <delineationDefault>
    <code></code>
    <name></name>
  </delineationDefault>
  <NESvalue>3.0</NESvalue>
  <AUCvalue>0.03</AUCvalue>
  <visualisationValue>5000</visualisationValue>
</trackRankingsDatabase>
<overlap>-1.0</overlap>
<delineation/>
<upstream>-1</upstream>
<downstream>-1</downstream>
<attributeName>name</attributeName>

cytoscape / iregulon-webapp

Data Export #10