Closed xiangyupan closed 1 year ago
Hi @xiangyupan, thank you for your interest in IRescue.
Could you check if the files IRescue_out/barcodes.tsv.gz
and IRescue_out/barcodes.tsv.gz
are empty?
If they are not empty, please, copy and paste here the output of zcat barcodes.tsv.gz | head | cat -et
and zcat barcodes.tsv.gz | head | cat -et
.
Cheers B
Hi bepoli,
Thanks for your fast response!
Yes ,the IRescue_out/barcodes.tsv.gz and IRescue_out/features.tsv.gz files are empty. And the matrix.tsv.gz file is not generated.
Ok, this narrows down the problem.
One reason could be that bedtools and/or samtools are not in your user's PATH.
Please try bedtools --version
and samtools --version
to make sure you have them installed.
I noticed that you executed IRescue from ~/anaconda3/envs/irescue/bin/irescue
. Did you actually activated the conda environment? If not, activate it with conda activate irescue
and then run the analysis again.
Thanks for your reply.
I make sure that bedtools and samtools are both in my PATH just as follows.
Now I will try the second suggestion, as my bam file is huge (71Gb), I will feedback my results a little late.
Thanks again!
Now I will try the second suggestion, as my bam file is huge (71Gb), I will feedback my results a little late. Thanks again!
Please, when you run irescue, add the options -v
and --keeptmp
, so we can troubleshoot better.
Hi bepoli,
Fowllowing your suggestions, I reran the irescue yesterday, but the same error occurred.
The latese command is nohup irescue -b possorted_genome_bam.bam -p 8 -r /public1/home/sc60481/Axolotl/sc-RNA/03.deal.TE/All.TE.deal.bed -w ./filtered_feature_bc_matrix/barcodes.tsv.gz -v --keeptmp &
.
And the log file contains some process infomation as follows.
It seems that no other errors during the mapping or intersecting process, but the ./IRescue_out//barcodes.tsv.gz and ./IRescue_out//features.tsv.gz files are still empty.
Do you need other files to troubleshoot this error?
Thanks for your help.
Hi @xiangyupan , now we should be able to find the problem.
head
of your BED and BAM file?IRescue_tmp
directory. Inside you will find two directories refs
and isec
. Please check if the files in each directory are empty (you can just check one file inside refs
and one file inside isec
). If they are not empty, send me a head
of each file.Hi, Sorry for the delayed response. I have checked my bed file and Bam file, the chromosomes are identical. The 'C0XXX' IDs are scaffold of the axolotl genome. I randomly select 100 transposable elements and write it into "All.TE.deal.random100.bed.txt" file. And the "dp0.bam.header.txt" file is the header infomation of the single-cell RNA-seq bam. I have the IRescue_tmp directory and the head info of files in refs (chr9q4.bed.gz) and isec (chr9q4.isec.bed.gz) are also as follows. Thanks for your time.
All.TE.deal.random100.bed.txt dp0.bam.header.txt chr9q4.bed.gz chr9q4.isec.bed.gz
Hi @xiangyupan,
No problem, thank you for getting back to me.
I have found and fixed two tiny bugs that could cause your problem. I think that the read names in your BAM file end with /1
, is it the case? Also, I see that you are running the analysis at superfamily/class level, not subfamily, and this was an use case that I overlooked. Now it should work with your input BAM and BED files.
So let's try this:
irescue
conda environment, run pip install ./IRescue-1.0.1.dev5+gb83049a.tar.gz
.irescue --version
. The result should be IRescue 1.0.1.dev5+gb83049a
.If you still receive the error, then there is something else that we need to check, but at least we would rule out a couple of possible bugs.
Thank you very much bepoli.
The update version works successfully in my job.
Now I have got the IRescue_out directory within three files now.
IRescue_out/
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz
Then I can use these to go further. Thanks again!
Good! Thank you for taking time to debug. I just merged the fixes in version 1.0.2, which will be available in conda in a few hours. If you need more help you can write here or open a new issue.
Hi @bepoli. Thanks for this great method. I'm using IRescue v1.0.2 and encountering this same problem.
IRescue_out/barcodes.tsv.gz
and IRescue_out/features.tsv.gz
files are both emptyrmsk.bed
file match those in my bam fileHere's the IRescue output from my run.
(irescue)
mcuoco at logg-cluster-1.salk.edu in ~/workflows/scrnaseq on main*
$ irescue --bam .test/results/map_count/a/Aligned.sortedByCoord.out.bam --genome hg38 -p 4
[02/14/2023 - 19:27:27] IRescue job starts
[02/14/2023 - 19:27:27] Downloading and parsing RepeatMasker annotation for assembly hg38 from https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.out.gz...
[02/14/2023 - 19:27:48] Writing mapped barcodes to ./IRescue_out//barcodes.tsv.gz
[02/14/2023 - 19:27:48] Writing mapped features to ./IRescue_out//features.tsv.gz
Traceback (most recent call last):
File "/raidixshare_log-g/mcuoco/miniconda3/envs/irescue/bin/irescue", line 10, in <module>
sys.exit(main())
File "/raidixshare_log-g/mcuoco/miniconda3/envs/irescue/lib/python3.10/site-packages/irescue/main.py", line 86, in main
bc_per_thread = list(split_bc(barcodes_file, args.threads))
File "/raidixshare_log-g/mcuoco/miniconda3/envs/irescue/lib/python3.10/site-packages/irescue/count.py", line 126, in split_bc
for chunk in split_int(bclen, n):
File "/raidixshare_log-g/mcuoco/miniconda3/envs/irescue/lib/python3.10/site-packages/irescue/count.py", line 112, in split_int
for i in range(0, num, split):
ValueError: range() arg 3 must not be zero
And here are the heads from IRescue_tmp
files
(irescue)
mcuoco at logg-cluster-1.salk.edu in ~/workflows/scrnaseq on main
$ zcat IRescue_tmp/refs/chr21.bed.gz | head
chr21 5010000 5010061 AluYk11~SINE/Alu 465 -
chr21 5010070 5010146 OldhAT1~DNA/hAT-Ac 201 -
chr21 5010433 5010563 MER4D~LTR/ERV1 1253 -
chr21 5010563 5010892 AluY~SINE/Alu 2392 -
chr21 5010892 5010955 MER4D~LTR/ERV1 1253 -
chr21 5010955 5011213 AluJb~SINE/Alu 1590 +
chr21 5011213 5011399 MER4D~LTR/ERV1 1253 -
chr21 5011430 5011555 MER4D~LTR/ERV1 494 -
chr21 5011977 5012144 L1MA10~LINE/L1 341 -
chr21 5012275 5012356 L2c~LINE/L2 194 +
gzip: stdout: Broken pipe
(irescue)
mcuoco at logg-cluster-1.salk.edu in ~/workflows/scrnaseq on main
$ zcat IRescue_tmp/isec/chr21.isec.bed.gz | head
AluJo~SINE/Alu
AluJo~SINE/Alu
AluJo~SINE/Alu
AluJo~SINE/Alu
AluJo~SINE/Alu
AluJo~SINE/Alu
AluJo~SINE/Alu
MLT1D~LTR/ERVL-MaLR
MLT1D~LTR/ERVL-MaLR
AluSx1~SINE/Alu
gzip: stdout: Broken pipe
Hi @mikecuoco, thank you for the detailed report. The `chr21.isec.bed.gz' file misses the cell barcode and UMI sequences, so I think it could be something in your BAM file that IRescue doesn't recognize yet.
Could you paste here a few lines from your BAM file?
Let's also check the integrity of the header: could you run samtools idxstats Aligned.sortedByCoord.out.bam
and check if you obtain a 4-column text with chromosome name, length, mapped and unmapped reads?
Hi @bepoli, thanks for the quick response! It turns out STARsolo does not include the CR and CB tags by default. I added the following line to my STARsolo call and IRescue works fine now.
--outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM
Great! I'm happy to hear that!
Hi, Thanks for the nice job in bioRxiv, hope it will be successfully accepted in a good journal. Now I want to use it to quantificate my single-cell RNA-seq data. An error occurred when I ran the command
nohup ~/anaconda3/envs/irescue/bin/irescue -b possorted_genome_bam.bam -p 8 -r /public1/home/sc60481/Axolotl/sc-RNA/03.deal.TE/All.TE.deal.bed -w ./filtered_feature_bc_matrix/barcodes.tsv.gz &
. I am not sure what caused the error. Hope for your reply and help.Thanks for your time and work.