cougarlj / COMPSRA

COMPSRA: a COMprehensive Platform for Small RNA-Seq data Analysis
https://regepi.bwh.harvard.edu/circurna/
GNU General Public License v3.0
16 stars 6 forks source link

DEG was failed to run #4

Closed mars188 closed 3 years ago

mars188 commented 4 years ago

Hi, I ran qc, alignment and annotation modules that worked fine but when I run the function module it gives me the following error. BTW I am running this with example data that I downloaded online.

my commandline: java -jar COMPSRA.jar -ref hg38 -fun -fd -fdclass 1,2,3,4,5,6 -fdcase 1-6 -fdctrl 7-12 -fdnorm cpm -fdtest mwu -fdann -pro COMPSRA_DEG -inf ./example/*fastq -out ./example_out/

Here is the output:

Working Directory: /scratch/gencore/ma5877_test_software/COMPSRA Bundle Directory: /scratch/gencore/ma5877_test_software/COMPSRA/bundle_v1 Configuration Directory: /scratch/gencore/ma5877_test_software/COMPSRA/bundle_v1/configuration Plug Directory: /scratch/gencore/ma5877_test_software/COMPSRA/bundle_v1/plug N_CPU: 28

Configuration Info: The endogenous database configuration has been set! Configuration Info: 36 databases are set! 15:30:20.063 [main] INFO edu.harvard.channing.compass.core.Configuration - The Configuration was completed!

The hg38 reference genome was set. QC module will not be performed. Alignment module will not be performed. Annotation module will not be performed. Microbe module will not be performed.

+++++++++++++++++++++++++++++++

15:30:23.101 [pool-2-thread-1] INFO edu.harvard.channing.compass.core.fun.DEG -

| DEG |

15:30:23.105 [pool-2-thread-1] INFO edu.harvard.channing.compass.core.fun.DEG - Class Parse ---> OK! 15:30:23.106 [pool-2-thread-1] INFO edu.harvard.channing.compass.core.fun.DEG - Test Parse ---> OK! 15:30:23.107 [pool-2-thread-1] INFO edu.harvard.channing.compass.core.fun.DEG - Sample Parse ---> OK! 15:30:23.109 [pool-2-thread-1] INFO edu.harvard.channing.compass.core.fun.DEG - DEG was failed to run. 15:30:23.110 [main] INFO edu.harvard.channing.compass.core.Produce - Function module was completed.

+++++++++++++++++++++++++++++++++++++++

Please help to fix this issue. Thanks in advance!

cougarlj commented 4 years ago

Hi, mars188,

I think this problem may be caused by the parameter -inf. You should provide a file that contains all the filenames you choose but not the regular expression. For details, you can see the online manual.

Best Wishes, Jiang Li

mars188 commented 4 years ago

Thank you Jiang Li for your response.

I have looked up in the manual where it says input file -inf should be /example/sample.list. I tried different format of pointing towards samples (.txt files generated during annotation step) but still get the same error. I also tried to point towards .fastq files but still get the error.

Can you please help me in figuring out how exactly "sample.list" should be formatted? Also for -fdcase 1-6 and -fdctrl 7-12 means I should give sample names and path for case vs controls or just 1-6 and/or 7-12 should serve the purpose?

This may be trivial for you but I am stuck on this stage. I would really appreciate your help!

Many thanks,

cougarlj commented 4 years ago

Dear mars188,

Sorry for the inconvenient use. It seems that I forget to provide some description of the sample.list file. In the sample.list file, you should list the output files from annotation module in one column. I will give you an example.

[sample.list] \your\file\path\your_output_sample01_STAR_Aligned_miRNA.txt \your\file\path\your_output_sample02_STAR_Aligned_miRNA.txt \your\file\path\your_output_sample03_STAR_Aligned_miRNA.txt . . . \your\file\path\your_output_sample12_STAR_Aligned_miRNA.txt

Best Wishes, Jiang

mars188 commented 4 years ago

Hi Jiang, thank you for getting back to me. I have tried all this but still get the error "DEG was failed to run". Here is my whole command for QC, assembly and annotation:

java -jar COMPSRA.jar -ref hg38 -qc -ra TGGAATTCTCGGGTGCCAAGG -rb 4 -rh 20 -rt 20 -rr 20 -rlh 8,17 -aln -mt star -ann -ac 1 -inf ./example/sample.list -out ./example_out/

Above, I choose 1 because I wanted to analyze only miRNA. And sample.list file contained path to my .fastq files. The output here looked good, and had the following required file: sample01_17to50_FitRead_STAR_Aligned_miRNA.txt sample02_17to50_FitRead_STAR_Aligned_miRNA.txt ... ... so on.

Then I ran the function command as below: java -jar COMPSRA.jar -ref hg38 -fun -fd -fdclass 1 -fdcase 1-6 -fdctrl 7-12 -fdnorm cpm -fdtest mwu -fdann -pro COMPSRA_DEG -inf /scratch/gencore/ma5877_test_software/COMPSRA/example_out/samples_list -out /scratch/gencore/ma5877_test_software/COMPSRA/example_out/

The "samples_list" contained path to miRNA.txt files (output files of annotation module) as you mentioned.

When I executed this command, I get the following error: "DEG was failed to run"

Can you please help me in solving this issue? Only thing that I doubt about -fdcase 1-6 -fdctrl 7-12. How program identifies 1-6 are sample01, sample 02 ... sample06 for case and 7-12 are sample07... sample12 for controls.

I am working on a project for miRNA data analysis and I would really appreciate your help in solving this issue.

THANK YOU!

cougarlj commented 4 years ago

Dear mars188,

I'm glad to help you. First, I think you should provide the full path of the files in the samples_list file. For example, /home/tool/compsra/out/sample01_17to50_FitRead_STAR_Aligned_miRNA.txt
Second, "1-6" means from the first file to the sixth file according to the sample_list. So does "7-12".

Please let me know whether it does work. If not, please paste all the output on the screen, which will help me to debug.

Best Wishes, Jiang

mars188 commented 4 years ago

Dear Jiang, thank you for being helpful. I did use the full path to miRNA.txt files in the sample_list but still got that error.

mars188 commented 4 years ago

Here is the command that I executed:

java -jar COMPSRA.jar -ref hg38 -fun -fd -fdclass 1 -fdcase 1-6 -fdctrl 7-12 -fdnorm cpm -fdtest mwu -fdann -pro COMPSRA_DEG -inf /scratch/gencore/ma5877_test_software/COMPSRA/example_out/samples_list -out /scratch/gencore/ma5877_test_software/COMPSRA/example_out/

The samples_list contains FULL path to miRNA.txt files as follows: /scratch/gencore/ma5877_test_software/COMPSRA/example_out/sample01/sample01_17to50_FitRead_STAR_Aligned_miRNA.txt

/scratch/gencore/ma5877_test_software/COMPSRA/example_out/sample02/sample02_17to50_FitRead_STAR_Aligned_miRNA.txt ... .. so on.

mars188 commented 4 years ago

Please see below all the output:

+++++++++++++++++++++++++++++++

Working Directory: /scratch/gencore/ma5877_test_software/COMPSRA Bundle Directory: /scratch/gencore/ma5877_test_software/COMPSRA/bundle_v1 Configuration Directory: /scratch/gencore/ma5877_test_software/COMPSRA/bundle_v1/configuration Plug Directory: /scratch/gencore/ma5877_test_software/COMPSRA/bundle_v1/plug N_CPU: 28

Configuration Info: The endogenous database configuration has been set! Configuration Info: 36 databases are set! 10:25:09.350 [main] INFO edu.harvard.channing.compass.core.Configuration - The Configuration was completed!

The hg38 reference genome was set. QC module will not be performed. Alignment module will not be performed. Annotation module will not be performed. Microbe module will not be performed.

+++++++++++++++++++++++++++++++

10:25:09.407 [pool-2-thread-1] INFO edu.harvard.channing.compass.core.fun.DEG -

| DEG |

10:25:09.409 [pool-2-thread-1] INFO edu.harvard.channing.compass.core.fun.DEG - Class Parse ---> OK! 10:25:09.409 [pool-2-thread-1] INFO edu.harvard.channing.compass.core.fun.DEG - Test Parse ---> OK! 10:25:09.410 [pool-2-thread-1] INFO edu.harvard.channing.compass.core.fun.DEG - Sample Parse ---> OK! 10:25:09.412 [pool-2-thread-1] INFO edu.harvard.channing.compass.core.fun.DEG - DEG was failed to run. 10:25:09.413 [main] INFO edu.harvard.channing.compass.core.Produce - Function module was completed.

+++++++++++++++++++++++++++++++++++++++

mars188 commented 4 years ago

I would really appreciate your help in solving this issue. Thanks in advance!

cougarlj commented 4 years ago

Dear mars188,

Thank you for the response. According to the output, can you delete the parameter -fdann and re-run your command? If it still doesn't work, please let me know.

Best Wishes, Jiang

mars188 commented 4 years ago

Dear Jiang, Yessss, it working now. Thank you soooo much for your help.

Just one minor question.

  1. next time if I want to analysis more than one small RNA e.g. miRNA and piRNA then I can use the -fdann option, right?

  2. I have different adapter sequences (other than the options given in the manual) that I want to use for trimming purpose. Can I just simply use those sequences for trimming or software has been designed to trim only the ones mentioned in the manual?

Thank you again for all your help. Cheers!

cougarlj commented 4 years ago

Dear mars188,

Sorry for the inconvenient use. The real meaning of -fdann seems changed compared with the original design. For many samples, I suggest you running each sample separately with different adapter sequences from QC module to Annotation module. And then, you can use -fun -fd to test the samples or use -fun -fm to merge all samples output together and deal with this in R for complex downstream analysis. If any problem, let me know.

Best Wishes, Jiang

mars188 commented 4 years ago

Dear Jiang,

Thank you for your help and time. I have completed the pipeline and found differentially expressed miRNA in my experiments. Now, I want to perform two additional steps.

  1. calculate False Discovery Rate (FDR)
  2. control for some covariates (sex, age etc)

Does COMPSRA supports this? I would really appreciate your help.

Many thanks,

cougarlj commented 4 years ago

Dear mars188,

Thank you for your response. Currently, the main function of COMPASS is to identify and annotate miRNAs. You can run COMPSRA through the first three modules and then merge the output files together to build a miRNA count profile. After that, you can directly analyze this data.frame in R. You can calculate FDR by p.adjust and control some covariates through lm or glm functions. I may complete these analyzes in the future release of COMPSRA. Thank you very much for these suggestions.

Best Wishes, Jiang

mars188 commented 4 years ago

Dear Jiang, Your comments have been very helpful and understanding the analysis steps and moving on with the downstream analysis.

Now, I understand the downstream workflow that you mentioned but I am new to R. Can you please suggest (and please send me a link) to any R package and/or tutorial to conduct glm from RNA counts? Where I can control for co-variates and calculate FDR.

I would really appreciate your help.

Many thanks again for all your time!

cougarlj commented 4 years ago

Dear mars188,

If you want to use R package, you can refer to DESeq2 (http://bioconductor.org/packages/release/bioc/html/DESeq2.html). Although this is designed for RNA-Seq, some fellows in my group still use this to handle small RNA-Seq data. If you only want to perform regression and calculate FDR, you can directly use glm() and p.adjust() in R without any extra package.

Best Wishes, Jiang