Closed pbuendia closed 1 month ago
We removed this option as the process for creating a de-novo transcriptome more often failed than it worked successfully. We do not currently offer an alternative. You may wish to look at https://github.com/bcgsc/RNA-Bloom or the suite of tools from https://sahlingroup.github.io/software/
Thank you, cjw85, for your clarifying reply! How would you recommend using wf-transcriptomes for novel isoform discovery on 1 sample? And how would you identify the novel isoforms (e.g. "unknowns" in the GTF file)?
The workflow requires either a reference transcriptome or a reference genome (which is used to curate a transcriptome from the data. These are the only options now available.
@cjw85 : Thanks again! This helps a lot! Would you please confirm that this command with just a reference genome and one sample can be used to identify novel isoforms and will these appear as MSTRG results as described in this recent issue ?
nextflow run epi2me-labs/wf-transcriptomes \
--fastq $sample1_fastq \
--transcriptome_source reference-guided \
--ref_genome Macaca_mulatta.fna \
--out_dir $outdir \
-profile singularity
Hi @pbuendia
You would need to also supply an annotation file using --ref-annotation
.
In the output file gffcompare/str_merged.transcripts*.gff.tmap
you will find a list of all transcripts identified.
The class_code
column refers to gffcompare class codes as defined here: https://ccb.jhu.edu/software/stringtie/gffcompare.shtml.
For instance entries with code 'u' are totally novel and have no corresponding annotation in the reference data.
@nrhorner : Thank you for your reply! We did run it with --ref-annotation, please see below if it looks correct, and got 2603 unknown trancripts, but with a different, older tool 'pinfish' + subreads, many more novel isoforms were found. That is why we are unsure of the best way to identify the novel isoforms and we tried to get those MSTRG results. Thanks in advance for any guidance!
nextflow run epi2me-labs/wf-transcriptomes \
--fastq $sample1_fastq \
--ref_genome Macaca_mulatta.fna \
--ref-annotation Macaca_mulatta.gff \
--out_dir $outdir \
-profile singularity
The best answer I can give is to refer you back to my original reply: https://github.com/epi2me-labs/wf-transcriptomes/issues/63#issuecomment-1915242427
Ask away!
Hi! I would really appreciate an answer to my question as I am working in a lab that used to run "pinfish" for novel isoform discovery without differential expression DE analysis. This option was available in wf-transcriptomes previous versions but was removed in v0.4.0 but there was no explanation of why.
Why was
`--transcriptome_source
denovo ` option removed and how should one run novel isoform discovery without DE analysis in wf-transcriptomes?Thank you in advance for your clarifying answer!
Paty