BUSCO (Benchmarking Universal Single-Copy Orthologs) fails with isONform output

NStrowbridge commented 1 year ago

As the title indicates, I recently used isONform full pipeline on data generated using ONT PCR cDNA [SQK-PCS109) kit. Using the output "transcriptome.fastq" with BUSCO gives me the error message "ERROR: The input file does not contain nucleotide sequences.", BUSCO must not be recognizing the output as a transcriptome. I did notice when looking directly at the transcriptome file that the nucleotide sequence is followed by "+ +++++++++++++ (repeating)", which I assume is QC results? I'm not sure if this has anything to do with it.

If you could point me in the right direction for getting this issue sorted, or perhaps suggesting other QC steps for the de-novo transcriptome that would be greatly appreciated.

Kind regards,

Nic Strowbridge, MSc

NStrowbridge commented 1 year ago

Nevermind realized my mistake! Converted from fastq to fasta format, is now working! However, I am still interested if you have advice for further QC steps for de-novo transcriptomes

ksahlin commented 1 year ago

Hi Nic,

You are correct.

On that note, @aljpetri we should output fasta files of the transcriptome instead of fastq (quality values are long gone by this stage).

aljpetri commented 1 year ago

Hi, how did the BUSCO analysis go? Please feel free to give feedback should you find anything odd. I have now rolled out a new release for isONform that outputs fasta files as a standard instead of fastq. Concerning other QC steps for transcriptomes: This depends mainly on which data you have available. Should you have a reference for the organism you are interested in you could try to run SQANTI to learn more about the transcriptome. If you do not have any additional data available I do not know of any QC steps that you could perform. Best, Alex

aljpetri / isONform

BUSCO (Benchmarking Universal Single-Copy Orthologs) fails with isONform output #13