Closed beyondpie closed 4 months ago
Hi @jayluo2
Hi @beyondpie, thanks again for the BAM files! I am also wondering if raw fragments.tsv.gz files (for each sample) can be provided, as this would facilitate regrouping of sample fragments into their corresponding cell types?
Update: I will try make_fragment_file() from SnapATAC2 first.
@jayluo2 Sure. I will give it a look! I recently have some personal things to handle, will reply to you next week (please feel free to let me know if you need other data, we can also have zoom meeting if needed). Thanks! Songpeng
Hi @beyondpie, I am using the following command to generate fragments.tsv files, as in this script
stat = snapatac2.pp.make_fragment_file(bam_path,
fragments_filename,
is_paired = True,
barcode_regex = "^(\w+):.+",
shift_left = 4,
shift_right = -4,
min_mapq = 30
)
When this command is run sequentially (per bam file), some fragments files are successfully generated with complete stats files, but ultimately produced this error
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 5, kind: Uncategorized, message: "Input/output error" }', /root/.cargo/git/checkouts/snapatac2-98bd0f054bf149f3/7154268/snapatac2-core/src/preprocessing/fragment.rs:141:43
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
PanicException Traceback (most recent call last)
Cell In[10], line 2
1 for ind, bam_path in enumerate(bam_filenames_absolute_path_l):
----> 2 make_fragments_file_all(bam_path)
Cell In[9], line 19, in make_fragments_file_all(input_path)
17 t1 = time.time()
18 input_list = [bam_file_absolute_path_i, fragments_filename_i]
---> 19 stat = make_fragments_file_i(input_list)
21 with open(fragments_stats_file, 'w') as f:
22 f.writelines(str(stat))
Cell In[8], line 5, in make_fragments_file_i(input_list)
2 bam_absolute_path_i = input_list[0]
3 fragments_filename = input_list[1]
----> 5 stat = snapatac2.pp.make_fragment_file(bam_absolute_path_i,
6 fragments_filename,
7 is_paired = True,
8 barcode_regex = "^(\w+):.+",
9 shift_left = 4,
10 shift_right = -4,
11 min_mapq = 30
12 )
13 return stat
File ~/miniconda3/envs/conda_env/lib/python3.8/site-packages/snapatac2/preprocessing/_basic.py:88, in make_fragment_file(bam_file, output_file, is_paired, barcode_tag, barcode_regex, umi_tag, umi_regex, shift_left, shift_right, min_mapq, chunk_size)
85 if barcode_tag is not None and barcode_regex is not None:
86 raise ValueError("Only one of barcode_tag or barcode_regex can be set.")
---> 88 return internal.make_fragment_file(
89 bam_file, output_file, is_paired, shift_left, shift_right, chunk_size,
90 barcode_tag, barcode_regex, umi_tag, umi_regex, min_mapq,
91 )
PanicException: called `Result::unwrap()` on an `Err` value: Os { code: 5, kind: Uncategorized, message: "Input/output error" }
When the command is run in parallel (using joblib), I receive errors that appear to differ between sample bam files. For example, for CEMBA171206_3C, I get
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Io(Os { code: 28, kind: StorageFull, message: "No space left on device" })', /root/.cargo/git/checkouts/snapatac2-98bd0f054bf149f3/7154268/snapatac2-core/src/preprocessing/mark_duplicates.rs:164:47
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "bam_to_fragments.py", line 55, in <module>
make_fragments_file_all(bam_path)
File "bam_to_fragments.py", line 38, in make_fragments_file_all
stat = snapatac2.pp.make_fragment_file(bam_file_absolute_path_i,
File "/home/groups/tttt/xjluo/miniconda3/envs/single_cell/lib/python3.8/site-packages/snapatac2/preprocessing/_basic.py", line 88, in make_fragment_file
return internal.make_fragment_file(
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: Io(Os { code: 28, kind: StorageFull, message: "No space left on device" })
(the “No space left on device” is unexpected, since I am writing to a large storage repository)
but for CEMBA171219_4D I have
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /root/.cargo/git/checkouts/snapatac2-98bd0f054bf149f3/7154268/snapatac2-core/src/preprocessing/mark_duplicates.rs:426:43
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "bam_to_fragments.py", line 55, in <module>
make_fragments_file_all(bam_path)
File "bam_to_fragments.py", line 38, in make_fragments_file_all
stat = snapatac2.pp.make_fragment_file(bam_file_absolute_path_i,
File "/home/groups/tttt/xjluo/miniconda3/envs/single_cell/lib/python3.8/site-packages/snapatac2/preprocessing/_basic.py", line 88, in make_fragment_file
return internal.make_fragment_file(
pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value
I’m not sure if these errors are specific to particular samples. I also noticed that running this code in a Git repository appeared to generated many gigabytes (> 400G) of content (I did not check the specific files/source of this) in the folder from which the script is run (not the folder to which fragments files are written). It’d be great to get feedback on this!
Alternatively, it’d be great to have uploaded fragments.tsv (or tsv.gz with corresponding tabix index files) files, as these errors may also be due to differences between our software environments.
Best, Jay
@jayluo2 Hi Jay, thank you so much for the information! I just finished some personal things, and went back to lab in this week. Will keep you updated in this week. Songpeng
Hi @beyondpie, thank you for providing the BAM files! I would like to confirm two items:
- Are the BAM files deduplicated?
- Could you please verify the checksums (md5sum) of the BAM files (attached)?
Best, Jay
@jayluo2 I confirmed that the md5sums are same. I will remove the bams now, and check the question you. Thanks! Songpeng
Thank you for confirming!
Best, Jay
@jayluo2
Songpeng
@jayluo2 should I close the issue? Songpeng
Yes, thank you!
Hi @beyondpie, thank you for providing the BAM files! I would like to confirm two items:
Best, Jay
catlas2023_bam_md5sums.txt