bodegalab / irescue

A tool to quantify transposable elements expression in scRNA-seq
MIT License
9 stars 1 forks source link

ValueError: range() arg 3 must not be zero #1

Closed xiangyupan closed 1 year ago

xiangyupan commented 1 year ago

Hi, Thanks for the nice job in bioRxiv, hope it will be successfully accepted in a good journal. Now I want to use it to quantificate my single-cell RNA-seq data. An error occurred when I ran the command nohup ~/anaconda3/envs/irescue/bin/irescue -b possorted_genome_bam.bam -p 8 -r /public1/home/sc60481/Axolotl/sc-RNA/03.deal.TE/All.TE.deal.bed -w ./filtered_feature_bc_matrix/barcodes.tsv.gz & . I am not sure what caused the error. Hope for your reply and help.

Thanks for your time and work.

图片

bepoli commented 1 year ago

Hi @xiangyupan, thank you for your interest in IRescue.

Could you check if the files IRescue_out/barcodes.tsv.gz and IRescue_out/barcodes.tsv.gz are empty? If they are not empty, please, copy and paste here the output of zcat barcodes.tsv.gz | head | cat -et and zcat barcodes.tsv.gz | head | cat -et.

Cheers B

xiangyupan commented 1 year ago

Hi bepoli, Thanks for your fast response! Yes ,the IRescue_out/barcodes.tsv.gz and IRescue_out/features.tsv.gz files are empty. And the matrix.tsv.gz file is not generated. 图片

bepoli commented 1 year ago

Ok, this narrows down the problem.

One reason could be that bedtools and/or samtools are not in your user's PATH. Please try bedtools --version and samtools --version to make sure you have them installed.

I noticed that you executed IRescue from ~/anaconda3/envs/irescue/bin/irescue. Did you actually activated the conda environment? If not, activate it with conda activate irescue and then run the analysis again.

xiangyupan commented 1 year ago

Thanks for your reply. I make sure that bedtools and samtools are both in my PATH just as follows. 图片 Now I will try the second suggestion, as my bam file is huge (71Gb), I will feedback my results a little late. Thanks again!

bepoli commented 1 year ago

Now I will try the second suggestion, as my bam file is huge (71Gb), I will feedback my results a little late. Thanks again!

Please, when you run irescue, add the options -v and --keeptmp, so we can troubleshoot better.

xiangyupan commented 1 year ago

Hi bepoli, Fowllowing your suggestions, I reran the irescue yesterday, but the same error occurred. The latese command is nohup irescue -b possorted_genome_bam.bam -p 8 -r /public1/home/sc60481/Axolotl/sc-RNA/03.deal.TE/All.TE.deal.bed -w ./filtered_feature_bc_matrix/barcodes.tsv.gz -v --keeptmp &. And the log file contains some process infomation as follows. It seems that no other errors during the mapping or intersecting process, but the ./IRescue_out//barcodes.tsv.gz and ./IRescue_out//features.tsv.gz files are still empty. Do you need other files to troubleshoot this error? Thanks for your help.

图片

bepoli commented 1 year ago

Hi @xiangyupan , now we should be able to find the problem.

xiangyupan commented 1 year ago

Hi, Sorry for the delayed response. I have checked my bed file and Bam file, the chromosomes are identical. The 'C0XXX' IDs are scaffold of the axolotl genome. I randomly select 100 transposable elements and write it into "All.TE.deal.random100.bed.txt" file. And the "dp0.bam.header.txt" file is the header infomation of the single-cell RNA-seq bam. I have the IRescue_tmp directory and the head info of files in refs (chr9q4.bed.gz) and isec (chr9q4.isec.bed.gz) are also as follows. Thanks for your time.

All.TE.deal.random100.bed.txt dp0.bam.header.txt chr9q4.bed.gz chr9q4.isec.bed.gz

bepoli commented 1 year ago

Hi @xiangyupan,

No problem, thank you for getting back to me.

I have found and fixed two tiny bugs that could cause your problem. I think that the read names in your BAM file end with /1, is it the case? Also, I see that you are running the analysis at superfamily/class level, not subfamily, and this was an use case that I overlooked. Now it should work with your input BAM and BED files.

So let's try this:

If you still receive the error, then there is something else that we need to check, but at least we would rule out a couple of possible bugs.

xiangyupan commented 1 year ago

Thank you very much bepoli. The update version works successfully in my job. Now I have got the IRescue_out directory within three files now. IRescue_out/ ├── barcodes.tsv.gz ├── features.tsv.gz └── matrix.mtx.gz Then I can use these to go further. Thanks again! 图片

bepoli commented 1 year ago

Good! Thank you for taking time to debug. I just merged the fixes in version 1.0.2, which will be available in conda in a few hours. If you need more help you can write here or open a new issue.

mikecuoco commented 1 year ago

Hi @bepoli. Thanks for this great method. I'm using IRescue v1.0.2 and encountering this same problem.

  1. My IRescue_out/barcodes.tsv.gz and IRescue_out/features.tsv.gz files are both empty
  2. My environment has samtools 1.16 and bedtools 2.30.0
  3. The chromosome names in the rmsk.bed file match those in my bam file

Here's the IRescue output from my run.

(irescue) 
mcuoco at logg-cluster-1.salk.edu in ~/workflows/scrnaseq on main*
$ irescue --bam .test/results/map_count/a/Aligned.sortedByCoord.out.bam --genome hg38 -p 4
[02/14/2023 - 19:27:27] IRescue job starts
[02/14/2023 - 19:27:27] Downloading and parsing RepeatMasker annotation for assembly hg38 from https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.out.gz...
[02/14/2023 - 19:27:48] Writing mapped barcodes to ./IRescue_out//barcodes.tsv.gz
[02/14/2023 - 19:27:48] Writing mapped features to ./IRescue_out//features.tsv.gz
Traceback (most recent call last):
  File "/raidixshare_log-g/mcuoco/miniconda3/envs/irescue/bin/irescue", line 10, in <module>
    sys.exit(main())
  File "/raidixshare_log-g/mcuoco/miniconda3/envs/irescue/lib/python3.10/site-packages/irescue/main.py", line 86, in main
    bc_per_thread = list(split_bc(barcodes_file, args.threads))
  File "/raidixshare_log-g/mcuoco/miniconda3/envs/irescue/lib/python3.10/site-packages/irescue/count.py", line 126, in split_bc
    for chunk in split_int(bclen, n):
  File "/raidixshare_log-g/mcuoco/miniconda3/envs/irescue/lib/python3.10/site-packages/irescue/count.py", line 112, in split_int
    for i in range(0, num, split):
ValueError: range() arg 3 must not be zero

And here are the heads from IRescue_tmp files

(irescue) 
mcuoco at logg-cluster-1.salk.edu in ~/workflows/scrnaseq on main
$ zcat IRescue_tmp/refs/chr21.bed.gz | head
chr21   5010000 5010061 AluYk11~SINE/Alu        465     -
chr21   5010070 5010146 OldhAT1~DNA/hAT-Ac      201     -
chr21   5010433 5010563 MER4D~LTR/ERV1  1253    -
chr21   5010563 5010892 AluY~SINE/Alu   2392    -
chr21   5010892 5010955 MER4D~LTR/ERV1  1253    -
chr21   5010955 5011213 AluJb~SINE/Alu  1590    +
chr21   5011213 5011399 MER4D~LTR/ERV1  1253    -
chr21   5011430 5011555 MER4D~LTR/ERV1  494     -
chr21   5011977 5012144 L1MA10~LINE/L1  341     -
chr21   5012275 5012356 L2c~LINE/L2     194     +

gzip: stdout: Broken pipe
(irescue) 
mcuoco at logg-cluster-1.salk.edu in ~/workflows/scrnaseq on main
$ zcat IRescue_tmp/isec/chr21.isec.bed.gz | head
                AluJo~SINE/Alu
                AluJo~SINE/Alu
                AluJo~SINE/Alu
                AluJo~SINE/Alu
                AluJo~SINE/Alu
                AluJo~SINE/Alu
                AluJo~SINE/Alu
                MLT1D~LTR/ERVL-MaLR
                MLT1D~LTR/ERVL-MaLR
                AluSx1~SINE/Alu

gzip: stdout: Broken pipe
bepoli commented 1 year ago

Hi @mikecuoco, thank you for the detailed report. The `chr21.isec.bed.gz' file misses the cell barcode and UMI sequences, so I think it could be something in your BAM file that IRescue doesn't recognize yet.

Could you paste here a few lines from your BAM file?

Let's also check the integrity of the header: could you run samtools idxstats Aligned.sortedByCoord.out.bam and check if you obtain a 4-column text with chromosome name, length, mapped and unmapped reads?

mikecuoco commented 1 year ago

Hi @bepoli, thanks for the quick response! It turns out STARsolo does not include the CR and CB tags by default. I added the following line to my STARsolo call and IRescue works fine now.

--outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM
bepoli commented 1 year ago

Great! I'm happy to hear that!