HuntsmanCancerInstitute / USeq

180+ Java applications for analyzing next generation sequencing data from ChIPSeq, RNASeq, BisSeq, DNASeq, variant annotation/ filtering, alignment/VCF QC, capture array design, IGV/ DAS2/IGB/UCSC file manipulation, etc. Both GUI and cmd line interfaces.
http://bioserver.hci.utah.edu/USeq/Documentation/
17 stars 4 forks source link

AlignmentEndTrimmer can't read C. elegans chrX #4

Closed reichdp closed 6 years ago

reichdp commented 6 years ago

I've been using the USeq application AlignmentEndTrimmer to trim and filter RNAseq datasets for detection of A-to-I RNA editing, and I noticed that this application cannot process reads on chrX. Everytime I run it, it processes most reads just fine, but always throws the error "Chromosome chrX not found in reference file, skipping trimming step", and I end up with reads on chrX that contain too many mismatches (which should be filtered out).

I know chrX is in the reference fasta file, I can't find any issues that would cause chrX not to be read, and I have even tried running this with two different reference .fasta files, one that I made and one on the HCI-Uinta server (/home/Genomes/Worm/Ce10/Fasta/combineFile.fasta), and I end up with the same error. I've also tried downloading the most recent USeq version, and running the application with and without the options "-e -s -x 1" but I always get the same issue. So I suspect it is an error within the application itself, and not with the input files I have provided.

I have made a folder on the HCI-Uinta server (/scratch/dreich/Dan/aetERROR) containing a shell file to process an example .bam file (with the two reference fasta files) and a log file with the output of the run. Feel free to email me (reichdp@gmail.com) with any questions.

DavidAustinNix commented 6 years ago

Looks like this is a major bug in the app, the last chromosome of data was not being parsed and trimmed. If you have used this app then reprocess your data. (Note, Nix, the primary author of USeq didn't write this tool.) It has been fixed and will be incorporated into the next USeq release.