Closed kittyBS closed 8 months ago
Hi @kittyBS, thank you for reaching out, and thank you for providing your configuration files and system info.
The file tinyrna-13652.txt tells me that at least one of the SAM files produced by bowtie is empty. Based on the timestamps, a possible cause for this is that a lot of reads are being lost during the fastp step. I would recommend:
Hello, Thank you for your quick return. As you said, in some examples there is no reading left after adapter cleaning. However, I could not find the specific adapter sequence in the log file..'''Read1 before filtering: total reads: 60538173 total bases: 9139350206 Q20 bases: 8008839684(87.6303%) Q30 bases: 7423335694(81.2239%)
Read1 after filtering: total reads: 0 total bases: 0 Q20 bases: 0(-nan%) Q30 bases: 0(-nan%)''' Can I create a bowtie index and print the sam files myself?Or can I do the trimming myself? Note: I solved the problem by providing the adapter sequence myself in run_config_template.yml, but this time, as far as I understand, I am having a memory problem, but if I can look at it correctly, I have enough space in the system. free -h total used free shared buff/cache available Mem: 125Gi 11Gi 41Gi 137Mi 72Gi 113Gi Swap: 8.0Gi 8.0Gi 23Mi tinyrna-13654.txt
Can I create a bowtie index and print the sam files myself?Or can I do the trimming myself?
These three tasks are handled automatically during end-to-end runs according to the settings in your Run Config and Paths File. You can also run tiny-count by itself if you have SAM files that were prepared outside of the pipeline.
Note: I solved the problem by providing the adapter sequence
Please double check that the problem has been resolved. The original error is no longer displayed, but tinyrna-13654.txt shows the pipeline stopping at an earlier step than in tinyrna-13652.txt. The tiny-collapse runtimes suggest that a lot of reads are still being lost during the fastp step. I suspect the original error would have been produced if the cluster workload manager (slurmstepd) had not interfered.
as far as I understand, I am having a memory problem slurmstepd: error: Detected 1 oom_kill event
I agree with your conclusion here. The memory readings you gave should be more than adequate for mm10 and 4 samples. The last line of your log mentions slurmstepd which is your cluster workload manager, and I'm guessing that it's configured for a memory limit that isn't high enough. It isn't a component of tinyRNA so you will need to speak with your system administrator about configuration for data intensive jobs like running tinyRNA.
Hi @kittyBS, To add to Alex's comments, you can reduce the memory usage substantially by running the samples sequentially rather than in parallel. To do so, all you need to do is change line 37 of run_config.yml to:
run_parallel: false
I also noticed in your features.csv, you specify "Class" with "any" value. Instead, based on your GTF you should specify each class you're interested in and give an identifier ("Classify as"). And you should also specify the desired "Overlap" between the reads and the features. For example,
Select for...,with value...,Classify as...,Source Filter,Type Filter,Hierarchy,Overlap,Mismatches,Strand,5' End Nucleotide,Length Class,miRNA,miRNA_gene,,,1,nested,0,both,all,all Class,lincRNA,lincRNA,,,2,nested,0,both,all,all
You should probably also include mature miRNAs, which I noticed are absent in your GTF, so you can compare the results to your previous analysis. If you're interested in the mature miRNAs, you can download a gff3 file from mirbase and use the following rule within your features sheet to just capture mature miRNAs:
Type,miRNA,miRNA,,,1,5' anchored,0,Sense,all,all
A hierarchy value of 1 will ensure that miRNA reads are not counted toward other features. Let us know if you need any further clarification. And if you run into any more issues we're happy to help you out.
Hello, Sorry for the late return, I was having a system problem. I'll change the parallel job execution and edit the gtf file, plus I'll ask for additional help from your comments on Features.csv. Actually, I want to see the percentage of RNAs found in my samples, not a specific RNA type, so I avoided giving a specific RNA type.Will giving a specific class as you suggest achieve this? Because when I run it with any, the class chart only creates "Unassigned" and "Unknown".
Hi @kittyBS, It looks like you have 13 classes of ncRNA features, listed below. I suggest that you make a rule in your features sheet for each Class. Most of the files output by tinyRNA will distinguish each of the classes but the total reads assigned to all of these features will also be generated by tiny-count in the alignment_stats.csv spreadsheet (see Total Assigned Reads). Scatter plot showing the difference in counts between your two conditions will be generated by tiny-plot with the classes indicated (see the output in the scatter_by_dge_class folder) or not indicated (see the output in the scatter_by_dge folder).
But if you really don't care about distinguishing the classes and want to capture counts for all overlapping reads, you can have a single rule in your features.csv:
Select for...,with value...,Classify as...,Source Filter,Type Filter,Hierarchy,Overlap,Mismatches,Strand,5' End Nucleotide,Length ,,ncRNA,,,1,Partial,0,both,all,all
This will classify all features as ncRNA, which will accomplish what you want.
Classes snRNA lincRNA miRNA snoRNA misc_RNA rRNA scaRNA bidirectional_promoter_lncRNA 3prime_overlapping_ncRNA sRNA scRNA Mt_tRNA Mt_rRNA
Hello, Thanks for your help. Based on your previous comments 1.changed line 37 of run_config.yml to run_parallel: false
Hi @kittyBS, Glad you were able to get it to work. The "Unknown" class will include all the features in your GTF, and "Unassigned" are the reads that aligned to the genome but did not overlap with any of the features in your GTF. You can set the name by changing the "Classify by" value to whatever you choose, such as ncRNA. If you don't specify a name, it will default to "Unknown". See https://github.com/MontgomeryLab/tinyRNA/blob/master/doc/Configuration.md#features-sheet-details for details.
Hello, I'm new to the field of Informatics, so please excuse my question. I am trying to use your tool to see the differences in RNA types in the samples I have previously analyzed for microRNA, but I cannot correct the error in the attachment. I am attaching the gtf file I edited, feature, sample and the error output to help you. Also I couldn't figure out how to create feature.csv. Thank you in advance for your interest.
The system I use is Ubuntu 20.04.6 tinyRNA version is v1.5.0 tinyrna.zip