bioinfoUGR / sRNAtoolbox

small RNA analysis programs: standalone jar files (sRNAbench, sRNAde, etc), manuals and Docker image
MIT License
10 stars 3 forks source link

Haiprin Counts Are Slightly Off Problem #23

Open aliNorgen opened 1 year ago

aliNorgen commented 1 year ago

Hi All,

I am running into a very strange problem while using sRNAbench from within a Docker container. Seeing as I have almost exhausted all possible sources from which this error can come from, I thought reaching out here on GitHub may be helpful towards solving my problem.

Basically, I am having issues with the 'hairpin' annotations when running the sRNAbench tool on squirrel samples. The hairpin stats are not bad, but they are still slightly off for some reason and I really want to make sure that they are correct.

I have fixed the warnings that I was getting in my log files, but the issue with the hairpin mapping counts being off (for both sense and antisense) is still present and difficult to understand and diagnose.

I am missing something, but can anyone please let me know if I am missing a certain tool or package that could have ramifications for the hairpin annotation calculations? I checked my database and the 'hairpin.fa' files and everything seems to check out. Hence, I am quite confused as to why this hairpin bug is occurring.

I am on sRNAbench v1.6 and using the latest version of Docker. Also, note that everything else in my pipeline is working flawlessly. Even the 'mature' microRNAs are working as intended. All the other microRNAs and cDNAs have correct annotation stats. Thus, I am quite confused as to why it is just the hairpins that are experiencing problems.

EDIT ON JULY 24, 2023 @ 09:30 (24-hr time format)

I believe that the issue has something to do with the alignment that is being done for the 'hairpin' microRNAs. For some reason, the reads-per-million (RPM) between my Docker image's run and the host-machine runs are different. Even though the same cannot be said (thankfully) about the other microRNAs and the cDNAs, the 'hairpin' microRNAs are producing different RPMs for the 'hairpin' mircoRNAs.

I thought I would add this information in-case it is useful. I am investigating my use of 'bowtie1' and 'STAR' in my Docker image to (hopefully) narrow down this nasty bug.

EDIT ON JULY 24, 2023 @ 11:47 (24-hr time format)

I imported and copied 'STAR' and 'bowtie1' straight from my Ubuntu host machine rather than use pre-compiled binaries for both tools from their official GitHub releases page. This step, unfortunately, did not yield any different results when it comes to the 'hairpin' problem. I am still getting slightly different 'hairpin' reads between the Ubuntu host-machine run and the Docker run. The other microRNAs and cDNAs are still the same.

EDIT ON JULY 27, 2023 @ 13:55 (24-hr time format)

To make diagnosing this issue a little easier, I have decided to screen-capture the difference in-between the summarised mapping stats for both, the docker run and the local run.

Here is a screen-capture of the summarised mapping stats from the local run (this is the correct run):

mappingStatSummary_local

Here is a screen-capture of the summarised mapping stats from the Docker run (this is the run with the slightly different hairpin stats):

mappingStatSummary_docker

Hopefully these two images can help us diagnose the issue correctly and fix the issue.

EDIT ON JULY 28, 2023 @ 09:18 (24-hr time format)

I have no idea why I did not share the 'hairpin' and 'mature' sequences files that I was using when I was getting this problem, but here they are:

https://uoguelphca-my.sharepoint.com/:f:/g/personal/ajawad_uoguelph_ca/EvVzKgHOZF5Is59-Y4cOQaABUsys-ordJUWbANSgNixvuA?e=2KRh1J

You will need to access and download the files using the OneDrive link because GitHub will not allow me to share '.fa' files directly in this post.

Many thanks, Ali