PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
243 stars 44 forks source link

pigeon classification doesn't work (Chinese Hamster) #682

Closed yesh1m closed 1 month ago

yesh1m commented 2 months ago
  1. Sample Data : Kinnex Full-length RNA, "collapsed.sorted.gff" from SMRTLink v13.0 & command line

  2. Reference Data : Chinese hamster (CriGri_1.0) ; downloaded from Ensembl genomic.modified.sorted.zip

  3. Used command: pigeon classify -d classify --log-level TRACE --log-file classify.log collapsed.sorted.gff ./genomic.modified.sorted.gtf ./Cricetulus_griseus_crigri.CriGri_1.0.dna.toplevel.fa

  4. Problem : The classification job does not produce final classification.txt with summary.txt and report.json - only tmp file with scaffold name (created until JH000801) image

The total transcript is 454,686 but the log stopped at 405,800. image

  1. Could you please review the GTF and find the root cause of this?
WillCh07 commented 2 months ago

How did you get the "collapsed.sorted.gff" from the SMRT Link, did you use the same reference?

yesh1m commented 2 months ago

They got the file from output directory of SMRTLink by using the same reference FASTA file from Ensembl. Also, they tried the analysis via command line started from hifi.bam but encountered same issue.

WillCh07 commented 2 months ago

Have they checked their analysis status, like ram usage (htop), it looks like there are too many scaffolds and contigs to accomplish the analysis.

yesh1m commented 2 months ago

Yes, after certain JH#, the process disappears without any message. Do you think specifying the number of threads to use via the '-j' option might address this issue?

armintoepfer commented 1 month ago

We can't help you with data processing. If you provide a small reproducible test case, we can have a look. Closing until reproducible case is available.