ablab / IsoQuant

Transcript discovery and quantification with long RNA reads (Nanopores and PacBio)
https://ablab.github.io/IsoQuant/
Other
153 stars 14 forks source link

Visualize.py? #222

Closed ccmeth closed 3 months ago

ccmeth commented 3 months ago

Hi, developers,

Thanks a lot for providing such a powerful tool for us to analyze ONT data. I installed isoquant with version 3.4.2 by means of "conda create -c conda-forge -c bioconda -n isoquant python=3.8 isoquant". Today, I noticed that visualize.py could be utilized to interpret data, however, I did not found it in 3.4.2 environment and thus directly downloaded it from github. Unfortunately, I ran an issue in installing src module via "pip install src": AttributeError: 'NoneType' object has no attribute 'has_pure_modules'

By the way, I guessed that visualize.py may be an update introduced in the latest version in v3.5.0, therefore I wondered whether reinstall isoquant with 3.5.0 will address this issue?

ccmeth commented 3 months ago

Aha..... I think reinstall isoquant with v3.5.0 will address this issue, since visualize.py does not use scr package installed via pip.

jackfreeman88 commented 3 months ago

Hi @ccmeth - just checking in to see if you resolved your issue? While visualization.py was implemented in version 3.5, we have added functionality to process Isoquant runs from previous versions - it just takes longer!

ccmeth commented 3 months ago

Hi @ccmeth - just checking in to see if you resolved your issue? While visualization.py was implemented in version 3.5, we have added functionality to process Isoquant runs from previous versions - it just takes longer!

Thanks for your reply! In fact, reinstall isoquant with v3.5 could directly use visualize.py, however, I met with several issues needed to be addressed. Firstly, I run isoquant (v3.5) with fastq files and generate an output dir named "PG_VS_AG_new", within which two dir existed, PG and AG, which could be seen below.

image

However, based on the instruction, I firstly run "visualize.py PG_VS_AG_new/ --gene_list genelist.txt --viz_output PG_VS_AG_new_vis", one issue occurred: "Directory not found: PG_VS_AG_new/OUT" "FileNotFoundError: Specified sample subdirectory does not exist."

Therefore, based on it, I changed the path of output dir from "PG_VS_AG_new/" to "PG_VS_AG_new/PG/", unfortunately, another issue occurred: "AssertionError: Params file not found: PG_VS_AG_new/PG/.params". What's the meaning of params?

jackfreeman88 commented 3 months ago

HI @ccmeth I see, thank you for the context. ".params" is a file (pickle serialization) we generate that captures all your parameters. Would you mind sending your isoquant.log? Thanks!

ccmeth commented 3 months ago

HI @ccmeth I see, thank you for the context. ".params" is a file (pickle serialization) we generate that captures all your parameters. Would you mind sending your isoquant.log? Thanks!

Sure. Please see the attachment. Thanks again for your help! isoquant.log

jackfreeman88 commented 3 months ago

Thank you! I've identified the issue and will patch hopefully within the day!

jackfreeman88 commented 3 months ago

hi @ccmeth I have opened up a PR that will address this issue! I know Andrey is OOO at the moment, so the changes are on the find_genes branch if it doesn't get pulled in immediately!

ccmeth commented 3 months ago

hi @ccmeth I have opened up a PR that will address this issue! I know Andrey is OOO at the moment, so the changes are on the find_genes branch if it doesn't get pulled in immediately!

Thanks for your help! Look forward to it!

ccmeth commented 3 months ago

hi @ccmeth I have opened up a PR that will address this issue! I know Andrey is OOO at the moment, so the changes are on the find_genes branch if it doesn't get pulled in immediately!

Hi, developer Thanks again for your help. I noticed two python files, plot_output.py and post_process.py, have been modified. I thus directly downloaded the updated codes and put them in src dir. Then the visualize.py could be run successfully! By the way, I still have one issue related to parameters in visualize.py, which could be seen below. The command line I used is "visualize.py PG_VS_AG_new/ --gene_list genelist.txt --viz_output PG_VS_AG_new_output/". However, I saw "Using reference-only based quantification" in the screen.

image

This is a little strange since I did not use "--ref-only". I therefore checked the visualize.py and found lines 100-101 were related to it. What's the meaning of output.extended_annotation? I did not find the definition. image

ccmeth commented 3 months ago

Another question is related to read_assignments.tsv. I read the instructions and realized that column "assignment events" should correspond to "isofrom_id". For instance, contents of column "assignment events", "ism_3,fake_terminal_exon_5:24892-29320", indicated two isoforms referred to, in my opinion, which are separated by comma. However, I only saw one isoform in column "isoform_id".

image

andrewprzh commented 3 months ago

Dear @ccmeth

Each line in read_assignments contain only a single read assignment to a single known isoform. Thus, all event are related only to one isoform indicated in column 4.

If a read is assigned ambiguously (as in your example), it has several entries, each containing information about a single assignment.

Best Andrey

andrewprzh commented 3 months ago

@ccmeth

The original visualization issue should be fixed now in IsoQuant 3.5.1.