COMBINE-lab / salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
777 stars 165 forks source link

Dealing with --dumpUmiGraph #392

Closed alexmascension closed 5 years ago

alexmascension commented 5 years ago

Hi,

I found the option of --dumpUmiGraph in alevin which, I suppose, gives the file to plot the UMI / Cell Barcode plot. However, after I run the analysis, I don't know how to extract that info into a useful plot.

What do I have to do to obtain that plot?

Thanks

k3yavi commented 5 years ago

Hi @alexmascension ,

Thanks again for your question. My apologies, we are working on updating the help document to point out at the relevant parsers. You can find a python parser for the cel_umi_graphs.gz file generated by alevin here. It generates a per cell level graphviz based dot file.

Do let us know if you face any difficulty in generating the graph files, or if you have ideas about dumping the graphs in a better format. If you have the code, we can also add a function in the python parser to the dump the graphs as write_X.

Hope it helps !

alexmascension commented 5 years ago

Hi @k3yavi ,

I am a bit confused about the cel_umi_graphs.gz file. You sent me to the email this image

dists

However, in the cel_umi_graphs.gz file I only obtained the correctly-CB'ed cells, and the rest of cells do not appear. Is there any way to create the plot with all cells?

k3yavi commented 5 years ago

Hi @alexmascension ,

The image above is not on cel_umi_graphs file it's on raw_cb_frequency.txt which you can generate using the flag --dumpFeatures. The graph file contains the Parsimonious graphs (PUGs) used by alevin in UMI deduplication, while the above problem is way before that i.e. while extracting the knee from the CB frequency. You can get the CB frequency in the raw_cb_frequency file.

By default alevin deduplicate only a subset of CB based on initial whitelisting, if you need to have the PUGs for all (most) of the CB, this issue https://github.com/COMBINE-lab/salmon/issues/379 is gonna be useful in understanding how to get that.

Hope it makes sense.

alexmascension commented 5 years ago

Hi @k3yavi ,

I have managed to get the plots at last. Thanks for the info!