Shians / NanoMethViz

https://shians.github.io/NanoMethViz/
Apache License 2.0
23 stars 1 forks source link

Modbam_to_tabix with 0-based coordinate system #39

Open Grigory-Kornienko opened 3 months ago

Grigory-Kornienko commented 3 months ago

Hi Shian!

I have a few UCSC aligned BAM files. UCSC uses a 0 based coordinate system unlike the Ensembl genome builds.

I have my ModBamResult object and I am trying to use modbam_to_tabix to get it into tabix format. I noticed that with Ensembl-aligned bam files, everything works, however the bam files for my data are aligned to UCSC. Is there a way to make this function work for 0 based coordinate systems?

I get the following output:

methy <- modbam_to_tabix(mbr, "res.tsv.bgz") `✔ Data converted: sample1_sorted.bam [8m 15.3s]
✔ Data converted: sample2_sorted.bam [4m 34s]
✔ Data converted: sample3_sorted.bam [9m 51.9s]
✔ Data converted: sample4_sorted.bam [13m 1.7s]
✔ Data converted: sample5_sorted.bam [10m 52s]
✔ Data converted: sample6_sorted.bam [8m 56.3s]
✔ Converting data to TSV [55m 31.6s] ✔ Sorting data [12m 21.8s] ... [W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option? [W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option? [W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option? [E::hts_idx_push] Unsorted positions on sequence #10: 248945815 followed by 1 Error in value[3L] : index build failed file: res.tsv.bgz ✖ Compressing data [5m 1.2s]'

Thanks a lot for the help! :)

Shians commented 3 months ago

As far as I know, BAM should always be 0-based and SAM always 1-based. I think the issue occurred elsewhere, how large is the resultant TSV file and how much memory do you have? I think sort sometimes does funny things when there isn't enough memory and corrupts the file.

Also, was this run on a Linux, Mac or Windows system?

Grigory-Kornienko commented 3 months ago

Hi Shian, Thanks for the quick reply.

The TSV file is 35.8Gb, I am running it from Rstudio loaded from an HPC cluster and I allocated 64Gb of memory.

Shians commented 3 months ago

I think there may be an issue in my conversion function. I will let you know when I have updated with a fix.

Grigory-Kornienko commented 3 months ago

Hello Shian,

The BAM files that I used had alignment and methylation calls done by Dorado v.7. When I do it with Dorado v.5, the modbam_to_tabix funciton works. I am currently re-basecalling my nanopore data with different options for Dorado v7. I suspect that the issue lies therein. I will keep you updated with how it goes.

Best, Grigory