What is the unit of count data from Annotation module?

cougarlj / COMPSRA

COMPSRA: a COMprehensive Platform for Small RNA-Seq data Analysis

https://regepi.bwh.harvard.edu/circurna/

GNU General Public License v3.0

16 stars 6 forks source link

What is the unit of count data from Annotation module? #19

Open yuzymatsuo opened 3 years ago

yuzymatsuo commented 3 years ago

Dear Jiang Li, Thank you for developing nice pipeline! I think your pipeline is very useful.

I wonder if you could help me understand the count data from Annotation module.

What is the unit of the read count (just count or CPM(count per millions))? If the unit is CPM, how do your module normalize to calculate it?

I would like to merge all text files (eg. _circRNA, miRNA, piRNA, snoRNA, snRNA and tRNA) into one.

Best regards, Yuzy

cougarlj commented 3 years ago

Dear Yuzy,

The count just means the read count of the miRNAs, which is the original number of reads from the fastq file that can be aligned to the certain miRNA. For CPM (count per million), we will normalize the total miRNA count to one million and each kind of miRNA will take its own part as the normalized count. You can calculate this by yourself in R platform with the miRNA count file from the output of COMPSRA.

To combine multi files, you could put the full file names in a list and use the "merge" function as below:

java -jar COMPSRA.jar -tk -merge -id 0 -count 2 -inf /fullpath/example_out/your_sample_list.txt -out /fullpath/example_out/your_sample_count.txt

If you have any questions, please let me know.

Best Wishes, Jiang Li

yuzymatsuo commented 3 years ago

Dear Jiang Li, Thank you for your kind help!

I have another question. Could you show me the version of different small RNA databases (miRBase, piRNABank, piRNACluster ,gtRNAdb and circBase) because I was not able to find it in your paper and documents? Are every databases latest version?

Best regards, Yuzy

cougarlj commented 3 years ago

Dear Yuzy,

Sorry for the inconvenience. These database should be the latest when we started to build COMPSRA. I will list the version for these database as below.

miBase --> V21 piRNABank ---> no update piRNACluster ---> fail to open the website gtRNAdb ---> GtRNAdb 2.0 circBase ---> no update

In fact, you can see most of these database were not updated and you can find them through the reference in the COMPSRA paper.

Best Wishes, Jiang Li

yuzymatsuo commented 3 years ago

Dear Jiang Li,

Thank you so much :) Your answer is really helpful. Sorry , I am struggling to solve another problem. Could you help me solve it?

Because I want to perform Microbe module, I have tried to download microbial prebuilt databases several time. Although I have succeeded in downloading blast_archaea and blast_viruses, I was not able to download both blast_bactera and blast_fungi.

Could you please tell me the alternative way to download the prebuilt database(bacteria and fungi)?

For you information, I attached the error massage as follow.

cougarlj commented 3 years ago

Dear yuzymatsuo,

This is a good question, because many people have the downloading problem. The reason is that our server is not good for web access because of some security problem, so it is very slow. I may transfer these resource to other servers next year, but currently I have no idea about this. All the data are saved in the webpage https://regepi.bwh.harvard.edu/circurna/bundle_v1 . So, I think maybe you can directly download the files through the browser or a FTP tool? Hope this can help you.

Best Wishes, Jiang Li