HCGB-IGTP / XICRA

Small RNAseq pipeline for paired-end reads
MIT License
7 stars 3 forks source link

miRNA module error #21

Closed Ben7124 closed 2 years ago

Ben7124 commented 3 years ago

Hi Jose,

I am getting an error when I run this command: XICRA miRNA --input CTL -t 12 --software miraligner optimir I do this after the join command which worked fine. The output is below. The summarizing of the results is failing for some reason. Can you please help me? In a previous post, another user had the same issue and there was an issue with the trimm command. My trimm command is: XICRA trimm --input CTL --adapters_a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapters_A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -t 12 --extra '-U 3 --minimum-length 15'. Could it be the --extra that is causing mapping not to occur properly? Basically, I want to trim the adapters, but also remove the first 3 bases of read 2. Thanks for your help!

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/bioinformatics/.local/bin/XICRA", line 342, in args.func(args) File "/home/bioinformatics/.local/lib/python3.6/site-packages/XICRA/modules/miRNA.py", line 267, in run_miRNA generate_DE.generate_DE(results_df, options.debug, expression_folder) File "/home/bioinformatics/.local/lib/python3.6/site-packages/XICRA/scripts/generate_DE.py", line 67, in generate_DE all_data_filtered, all_data_duplicated = discard_UID_duplicated(all_data, type_res=type_analysis) File "/home/bioinformatics/.local/lib/python3.6/site-packages/XICRA/scripts/generate_DE.py", line 88, in discard_UID_duplicated new_data[type_res] = tmp[0] File "/home/bioinformatics/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2800, in getitem indexer = self.columns.get_loc(key) File "/home/bioinformatics/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 0

Ben7124 commented 3 years ago

I fixed it by removing the -U 3 in the timm command. Is there any way to remove the first 3 bases of read 2 using the trimm command, along with the adapters?

JFsanchezherrero commented 3 years ago

Hi there,

I guess after removing 3bp as you were doing all reads were shorter that 15bp and were filtered out during the trimming. The error points to no data identified by miraligner during the miRNA analysis of each sample.

When you mentioned that you run trimm without -U 3 option I guess you did XICRA join later, right?

Another concern is about optimir, it might produce problems if the samtools software is not correctly installed. It is an issue but I haven't come to a solution yet.

I am afraid I don't know what else to say here. I have read the cutadaptdocumentation and it basically recommends to use -U 3 option as you mentioned. Also, are you sure you only need to remove 3 bp from the R2?

I will ask around and let you know any news. Best regards,

Ben7124 commented 3 years ago

Thanks Jose. I was wondering how I can use XICRA for single end reads? Do I just name the samples: SAMPLE1_R1.fastq.gz? And I skip the join steps? But doesn't the miRNA module look for a _joined file name? Thanks again.

JFsanchezherrero commented 3 years ago

Hi there,

Sorry for the delay, I guess I read the issue and completely forget about it.

Sure you can use single data within XICRA. Maybe, you might be interested in using your already trimmed data. So, you will need to rename samples.

Also, as we already determined in the original publication of XICRA (see link to paper) you might only want to use R1 reads. R2 reads would have smaller quality values and they would increase your false discovery variants.

For example, I would do something like the following.

  1. Copy and rename R1 read files
mkdir trimmed_R1_reads
for i in dir `XICRA_analysis/data/*/trimm/*R1.fastq`; 
do:
     cp "XICRA_analysis/data/*/trimm/*R1.fastq/"$i "trimmed_R1_reads/"$i"_trim.fastq"
done
  1. Run XICRA modules using detached. I would use the XICRA miRNA module or whatever using the option detached.
XICRA miRNA --detached -i trimmed_R1_reads -o miRNA_results_R1_reads --software ...

Let me know if you have any problems running this. I have just typed the commands and I have not tested myself, so take into account some issues regarding spelling. Please double check the shell command to copy and rename samples and read carefully the details of option --detached.

Best regards, Jose