Xinglab / espresso

Other
57 stars 4 forks source link

Requantify isoforms after filtering #32

Closed chrisamiller closed 11 months ago

chrisamiller commented 1 year ago

I've run the espresso steps on a cohort of data, generating a large list of isoforms. I then can apply filtering from something like SQANTI3 to remove those isoforms that I believe to be false positives, leaving me with a subset of the original list.

After this step, I'd like to keep the filtered isoform GTF static and use just the part of ESPRESSO_Q that assigns reads to these transcripts, quantifies transcript expression, and also spits out whether each read is a FSM, ISM, etc. Is the code modular enough that it's possible to run just this portion of the process?

EricKutschera commented 1 year ago

I think if you just run the Q step again, but with the new filtered gtf and also --read_ratio_cutoff 2 then ESPRESSO will try to assign the reads to only those isoforms in the filtered gtf. The output might be a little different if you run ESPRESSO from the beginning with the filtered gtf and also use --read_ratio_cutoff 2 in the S and Q steps since that will change the high confidence junctions used for realignment. The ratio cutoff of 2 is impossible to meet and should restrict ESPRESSO to only the splice junctions and isoforms in the filtered gtf

There's also the --tsv_compt /path/to/output/compatible_isoform.tsv argument in the Q step. That output file will show for each read whether it's FSM, ISM, or novel. The output file from the second run will have the FSM/ISM info based on the filtered gtf

chrisamiller commented 1 year ago

Thanks for the quick response! Giving that a try

chrisamiller commented 1 year ago

For others who may find this thread, this approach worked well. Thanks again!