Why did you remove the `--transcriptome_source` denovo option ?

epi2me-labs / wf-transcriptomes

Other

64 stars 30 forks source link

Why did you remove the `--transcriptome_source` denovo option ? #63

Closed pbuendia closed 1 month ago

pbuendia commented 5 months ago

Ask away!

Hi! I would really appreciate an answer to my question as I am working in a lab that used to run "pinfish" for novel isoform discovery without differential expression DE analysis. This option was available in wf-transcriptomes previous versions but was removed in v0.4.0 but there was no explanation of why.

Why was `--transcriptome_source denovo ` option removed and how should one run novel isoform discovery without DE analysis in wf-transcriptomes?

Thank you in advance for your clarifying answer!

Paty

cjw85 commented 5 months ago

We removed this option as the process for creating a de-novo transcriptome more often failed than it worked successfully. We do not currently offer an alternative. You may wish to look at https://github.com/bcgsc/RNA-Bloom or the suite of tools from https://sahlingroup.github.io/software/

pbuendia commented 5 months ago

Thank you, cjw85, for your clarifying reply! How would you recommend using wf-transcriptomes for novel isoform discovery on 1 sample? And how would you identify the novel isoforms (e.g. "unknowns" in the GTF file)?

cjw85 commented 5 months ago

The workflow requires either a reference transcriptome or a reference genome (which is used to curate a transcriptome from the data. These are the only options now available.

pbuendia commented 5 months ago

@cjw85 : Thanks again! This helps a lot! Would you please confirm that this command with just a reference genome and one sample can be used to identify novel isoforms and will these appear as MSTRG results as described in this recent issue ?

nextflow run epi2me-labs/wf-transcriptomes \
  --fastq $sample1_fastq \
  --transcriptome_source reference-guided \
  --ref_genome Macaca_mulatta.fna  \
  --out_dir $outdir  \
  -profile singularity

nrhorner commented 5 months ago

Hi @pbuendia

You would need to also supply an annotation file using --ref-annotation.

In the output file gffcompare/str_merged.transcripts*.gff.tmap you will find a list of all transcripts identified. The class_code column refers to gffcompare class codes as defined here: https://ccb.jhu.edu/software/stringtie/gffcompare.shtml.

For instance entries with code 'u' are totally novel and have no corresponding annotation in the reference data.

pbuendia commented 5 months ago

@nrhorner : Thank you for your reply! We did run it with --ref-annotation, please see below if it looks correct, and got 2603 unknown trancripts, but with a different, older tool 'pinfish' + subreads, many more novel isoforms were found. That is why we are unsure of the best way to identify the novel isoforms and we tried to get those MSTRG results. Thanks in advance for any guidance!

nextflow run epi2me-labs/wf-transcriptomes \
  --fastq $sample1_fastq \
  --ref_genome Macaca_mulatta.fna  \
  --ref-annotation Macaca_mulatta.gff  \
  --out_dir $outdir  \
  -profile singularity

cjw85 commented 1 month ago

The best answer I can give is to refer you back to my original reply: https://github.com/epi2me-labs/wf-transcriptomes/issues/63#issuecomment-1915242427