MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

Tiny-count error with Sequence-Based Counting Mode #332

Open vicgarcas opened 6 months ago

vicgarcas commented 6 months ago

Hi, I'm having problems running tinyRNA on my samples. I first tried running it with a .gtf reference annotation and as far as I can tell it went ok. Then I tried running it in the "Sequence-Based Counting Mode" using only reference genomes, and I encountered problems. I suspect the problems are in my features.csv files, but I'm not sure and I don't know how to fix it.

To give a bit more information on what I'm doing / want to do, my goal is to:

1) run the samples, which come from C. elegans against the reference genome of another organism, which was used to infect the worm. In this case I can not use a reference annotation. Looking at the .sam files it looks like bowtie run correctly (I have a lot of aligned sequences in my infected samples and no aligned sequences in my control samples). However, I get an error when it reaches tiny-count:

("Error collecting output for parameter 'alignment_stats': ../../miniconda3/envs/tinyrna/lib/python3.10/site-packages/tiny/cwl/tools/tiny-count.cwl:116:7: Did not find output file with glob pattern: ['OrV_ERT54_reinfections_OrV_ref_alignment_2024-04-09_13-59-03_alignment_stats.csv'].", {})ESC[0m ESC[32m[2024-04-09 13:59:34]ESC[0m ESC[1;30mWARNINGESC[0m ESC[33m[job tiny-count] completed permanentFail

I tried using the same features.csv as I used for the reference annotation (see below) and another features.csv, which you suggested to what somebody else asked in github:

Select for...,with value...,Classify as...,Source Filter,Type Filter,Hierarchy,Overlap,Mismatches,Strand,5' End Nucleotide,Length ,,unk,,,1,Partial,0,both,all,all

Both of them failed at the same point, which as far as I can tell is before or at the beginning of tiny-count. Do you have any advice?

2) another less important issue is that, if possible, I'd also like to run the samples against the reference genome of the worm, which is not really necessary given that it worked with the reference annotation. But in case it's an easy fix... I'm using the same features.csv file that I used for the reference annotation, which I think I got from here or from your lab's webpage:

Select for...,with value...,Classify as...,Source Filter,Type Filter,Hierarchy,Overlap,Mismatches,Strand,5' End Nucleotide,Length Class,risiRNA,rRNA,,,1,Partial,,both,all,all Class,miRNA,miRNA,,,2,"5' anchored, 0, 4",,sense,all,all Class,piRNA,piRNA,,,2,"5' anchored, 0, 4",,sense,all,18-21 Class,CSR,CSR Class 22G,,,3,Partial,,antisense,G,21-23 Class,WAGO,WAGO Class 22G,,,3,Partial,,antisense,G,21-23 Class,ALG,ALG Class 26G,,,3,Partial,,antisense,G,26 Class,ERGO,ERGO Class 26G,,,3,Partial,,antisense,G,26 Class,ALG,ALG target 22G,,,3,Partial,,antisense,G,21-23 Class,ERGO,ERGO target 22G,,,3,Partial,,antisense,G,21-23 Class,unk,unclassifed 22G,,,3,Partial,,antisense,G,21-23 Class,unk,unclassifed siRNA,,,4,Partial,,both,all,21-24

As far as I can tell it goes ok here until it reaches deseq, because the tiny-count output files seem ok. However, in the deseq output files it looks like everything was classified as rRNA - I don't have any information regarding loci (see attached deseq .csv table).

OrV_ERT54_reinf_ref_genome_12h_2024-04-09_12-29-48_cond1_OrV_cond2_Control_deseq_table.csv

Operating System: Mac OS Ventura 13.4 tinyrna version: 1.5.0

Thank you in advance for your time and for developing this!

Victoria

AlexTate commented 6 months ago

Hi @vicgarcas,

For the first issue, can you please compress the /config and /logs subdirectories in your Run Directory and attach them here? For future reference, the exported terminal output from a failed run can also help us with troubleshooting. You can obtain it through the menus at the top of the screen when the terminal is the foreground window: Shell > Export Text As... . For now, let's start with /config and /logs and go from there.

For the second issue, the DGE table looks like what I would expect for the run you described.

All of that being said, the Features Sheet that you attached was originally intended for feature-based counting so the results won't make sense here. It might be more useful to use specific sequences of interest in your reference sequences file, and then tailor your Features Sheet accordingly.

taimontgomery commented 6 months ago

Hi @vicgarcas, Another approach you might consider is to add the genome sequences of the organisms that you're using for infection to the C. elegans genome sequence file. Then when bowtie runs, it will capture reads from C. elegans and the other species simultaneously, so no need for separate runs. If you add the names and start and end coordinates of each of the other species chromosomes as entries in your GFF/GTF file and specify a unique Type in column 5 of the GFF/GTF, tiny-count will tally those reads separately if specified as distinct rules in the Features Sheet. We'd be happy to provide further guidance if you choose to use this approach. This won't necessarily solve the issue but if you share the files Alex requested, I'm sure we can get to the bottom of it.