cancerit / NanoSeq

Analysis software for Nanorate Sequencing (NanoSeq) experiments
GNU Affero General Public License v3.0
12 stars 8 forks source link

results.cov.bed.gz last line #84

Open gevro opened 7 months ago

gevro commented 7 months ago

Hi, The nanoseq pipeline vs v3.2.1 has a minor bug in that it outputs this as the last line in results.cov.bed.gz:

    0   1   ;;0 

I don't know yet if this happens in v3.5.4, but letting you know in case you see this bug and it persisted in v3.5.4.

fa8sanger commented 7 months ago

Hi,

I think that was fixed at some point, not sure in which version. If the problem persists in the newer version please let me know. Thank you

Federico

On 17 Jan 2024, at 19:08, gevro @.***> wrote:

Hi, The nanoseq pipeline vs v3.2.1 has a minor bug in that it outputs this as the last line in results.cov.bed.gz:

    0       1       ;;0

I don't know yet if this happens in v3.5.4, but letting you know in case you see this bug and it persisted in v3.5.4.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_84&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=-WJuz_POFc2BDO9L99VNxa6OK_ceVHaEHSOvmcUCZemjIjzCtR_o4abX2i9dH7Dt&s=OovvrSQXVKMN1BoLST0Xz-3_3Hdl8gsAuUv-UT2lesw&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3KKKSCLNED5FD7NQTLYPAOURAVCNFSM6AAAAABB7BYKX2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DMOBQHA2TOMI&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=-WJuz_POFc2BDO9L99VNxa6OK_ceVHaEHSOvmcUCZemjIjzCtR_o4abX2i9dH7Dt&s=OJXO1MoGVmZ74TnLU3pp1DVdEv3lVUWB82k-wHeLI6s&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA

gevro commented 7 months ago

Hi, This bug has actually not been fixed yet in v3.5.4. Last line of one results.cov.bed.gz:

    740435584   740435585   ;;0

I noticed this again because it caused this error in the 'post' step tabix, so it is causing the pipeline to crash:

Executing: tabix -f sampleID/tmpNanoSeq/post/results.cov.bed.gz

Traceback (most recent call last):
  File "/opt/wtsi-cgp/bin/runNanoSeq.py", line 1146, in <module>
    runCommand(cmd)
  File "/opt/wtsi-cgp/bin/runNanoSeq.py", line 430, in runCommand
    raise ValueError(error)
ValueError: [E::hts_idx_check_range] Region 740435584..740435585 cannot be stored in a tbi index. Try using a csi index
tbx_index_build failed: sampleID/tmpNanoSeq/post/results.cov.bed.gz

It looks like partition 32 of the var output is the issue:

$ bgzip -cd 32.cov.bed.gz
    740435584   740435585   ;;0

Is there a reason why partition 32 of var is empty? I see there is a check in the 'post' code to see if files are 0 bytes:

if ( os.stat(ifile).st_size == 0 ) : continue

But this doesn't catch these files that have one malformed line.

I can try a workaround by adding an awk script to the post that removes any rows with blank first column, but probably there is some upstream issue causing this to happen. I think this workaround (adding awk filter for NF==4) will work until the upstream bug is fixed:

cmd += "bgzip -dc %s | awk \'NF==4\' >> %s ;" % (ifile, outFile)
fa8sanger commented 7 months ago

Hi, Thank you for your help debugging this bug. A couple questions to understand this better: 1) When you run v3.5.4, did you rerun the entire pipeline or just the var/indel/post steps? 2) Is there any error message for var.32?

I recall finding this situation once that’s why I thought we fixed. But it may have been that the fix was just to regenerate the table with dsa rather than modifying the pipeline (sometimes filesystems do strange things with files). I am not sure. We will investigate this

Thank you again

On 25 Jan 2024, at 04:51, gevro @.***> wrote:

Hi, This bug has actually not been fixed yet in v3.5.4. Last line of one results.cov.bed.gz:

    740435584       740435585       ;;0

I noticed this again because it caused this error in the 'post' step tabix, so it is causing the pipeline to crash:

Executing: tabix -f sampleID/tmpNanoSeq/post/results.cov.bed.gz

Traceback (most recent call last): File "/opt/wtsi-cgp/bin/runNanoSeq.py", line 1146, in runCommand(cmd) File "/opt/wtsi-cgp/bin/runNanoSeq.py", line 430, in runCommand raise ValueError(error) ValueError: [E::hts_idx_check_range] Region 740435584..740435585 cannot be stored in a tbi index. Try using a csi index tbx_index_build failed: sampleID/tmpNanoSeq/post/results.cov.bed.gz

It looks like partition 32 of the var output is the issue:

$ bgzip -cd 32.cov.bed.gz 740435584 740435585 ;;0

Is there a reason why partition 32 of var is empty? I see there is a check in the 'post' code to see if files are 0 bytes:

if ( os.stat(ifile).st_size == 0 ) : continue

But this doesn't catch these files that have one malformed line.

I can try a workaround by adding an awk script to the post that removes any rows with blank first column, but probably there is some upstream issue causing this to happen.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_84-23issuecomment-2D1909350822&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=oSw66707YU88gzNCBPMqhyEMTNQilrNUSmnW0ZZgP1qLhxRmpDSLbroNkTG0-hc6&s=7ECErHefnzQrfaNo2-7-VAt48_zngT5Gn9LRtgmgck8&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3OPBWD6JJTB352QOETYQHQEFAVCNFSM6AAAAABB7BYKX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBZGM2TAOBSGI&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=oSw66707YU88gzNCBPMqhyEMTNQilrNUSmnW0ZZgP1qLhxRmpDSLbroNkTG0-hc6&s=cHhMipnl3f_tgtzoh67_JrS7FDdB3mG8AFikGMNt-k4&e=. You are receiving this because you commented.Message ID: @.***>

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA

gevro commented 7 months ago

1) When we reran the pipeline with v3.5.4, we started from the beginning: cov, part, dsa, var, etc. So the issue is still present in v3.5.4

2) I just checked the log for the 2 samples that had this issue. I don't see any error messages for chunk32 for var or dsa steps. In both of those samples, interestingly chunk 32 gets executed first for both dsa and var, not sure why. Maybe that has something to do with it. And both of them span chrY regions. I think most likely the issue is there are no mutations or candidate mutations in that chunk and somehow that causes some empty data variable/data structure in the upstream code that leads to this issue. If I had to guess, it probably means the coverage calculations and trinucleotide background info relating to that chunk may be wrong.

Note, this happened for 2 out of ~30 samples.

gevro commented 7 months ago

Just FYI: v3.5.5 still has this bug.

fa8sanger commented 7 months ago

Hi, I am sorry the problem persists. It’s hard to know from here what that may be. Could you check the output of dsa for that chunk (#32)? Is it an empty file?

In the meantime, since this is a rare error, a temporary solution would be to edit manually the corresponding cov.bed file and remove that line

On 17 Jan 2024, at 19:08, gevro @.***> wrote:

Hi, The nanoseq pipeline vs v3.2.1 has a minor bug in that it outputs this as the last line in results.cov.bed.gz:

    0       1       ;;0

I don't know yet if this happens in v3.5.4, but letting you know in case you see this bug and it persisted in v3.5.4.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_84&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=-WJuz_POFc2BDO9L99VNxa6OK_ceVHaEHSOvmcUCZemjIjzCtR_o4abX2i9dH7Dt&s=OovvrSQXVKMN1BoLST0Xz-3_3Hdl8gsAuUv-UT2lesw&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3KKKSCLNED5FD7NQTLYPAOURAVCNFSM6AAAAABB7BYKX2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DMOBQHA2TOMI&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=-WJuz_POFc2BDO9L99VNxa6OK_ceVHaEHSOvmcUCZemjIjzCtR_o4abX2i9dH7Dt&s=OJXO1MoGVmZ74TnLU3pp1DVdEv3lVUWB82k-wHeLI6s&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA

gevro commented 7 months ago

Sorry I deleted the temp folder already. But last time when I checked, there are no dsa or var errors for the problematic chunk.

As a workaround, I changed the post step script to only keep lines with 4 fields using awk when the bed coverage files are merged.

fa8sanger commented 7 months ago

I think a quick fix would be to modify variantcaller.cchttp://variantcaller.cc with the following:

Line 503, change else to else if (possibly not needed) } else if(cov>0) {

Line 536, most likely the problematic line, add "and cov > 0" to the IF clause: if(this->outfile_coverage != NULL and curr != -1 and cov > 0) {

Would you be so kind to try this fix on your problematic sample?

Thank you

On 5 Feb 2024, at 17:02, gevro @.***> wrote:

Sorry I deleted the temp folder already. But last time when I checked, there are no dsa or var errors for the problematic chunk.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_84-23issuecomment-2D1927484213&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=EgB-bIfBNLrDMfs3vqmzoBEUmkYod4PVJImL9_EB9zYvqKA0oKt-_scty8xgsWx4&s=w02fP745_sKgZ8b-uaKchg_i7_g1XVbR9H-rX4AUljg&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3KFQK54LMOPNKPAHZLYSEGEBAVCNFSM6AAAAABB7BYKX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRXGQ4DIMRRGM&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=EgB-bIfBNLrDMfs3vqmzoBEUmkYod4PVJImL9_EB9zYvqKA0oKt-_scty8xgsWx4&s=H8NJlW2b63CRUQp468s-_--ig2TlDsO_7c1L2ZXmVqU&e=. You are receiving this because you commented.Message ID: @.***>

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA

gevro commented 7 months ago

Will try that. However, if that is the fix, will we need to rerun the pipeline on all the samples? i.e. does this fix change the final coverage BED or the variant calls relative to my workaround of adding the below awk filter in the post step? cmd += "bgzip -dc %s | awk \'NF==4\' >> %s ;" % (ifile, outFile)

gevro commented 7 months ago

Also, do you have brief instructions on compilation after cloning the repository into the docker? I tried playing around with the Makefile and scripts in the build directory, but running into issues. Or if there is a simple way to only compile variantcaller.cc.

fa8sanger commented 7 months ago

I didn’t see your workaround, sorry. The fix I recommended is intended to solve the problem and should only affect your problematic sample.

The way I run the pipeline myself is by downloading the code and running ./setup.sh [INSTALLATION_PATH]

Sorry I cannot be of much help with dockers

On 6 Feb 2024, at 12:37, gevro @.***> wrote:

Also, do you have brief instructions on compilation after cloning the repository into the docker? I tried playing around with the Makefile and scripts in the build directory, but running into issues. Or if there is a simple way to only compile variantcaller.cc.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_84-23issuecomment-2D1929438900&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=DMIOPnK1cP3SijuM172Anbx70R07zgESWhMuqY8R0HAsVubtQw73b4HlC5nZXkdK&s=jlc04GDpT3h7BJSkRlIoqCQ7LSCMj-s3wC90LBqvmeM&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3LFEYNHUYOCL2OJHWDYSIPY7AVCNFSM6AAAAABB7BYKX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRZGQZTQOJQGA&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=DMIOPnK1cP3SijuM172Anbx70R07zgESWhMuqY8R0HAsVubtQw73b4HlC5nZXkdK&s=IXbk7VAJr_CSvp_mkILb7hb5BbBPDlo_tcsq7ISn7OE&e=. You are receiving this because you commented.Message ID: @.***>

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA

gevro commented 7 months ago

I've compiled and testing the bug fix (I changed the two lines in variantcaller.cc as you suggested), but I want to make sure it is an accurate fix.

I can see now that the 32.cov.bed.gz is empty. However, the 32.var has:

Does the final line indicate that there should be coverage in 32.cov.bed.gz? i.e. Coverage = 11060. So why is 32.cov.bed.gz now empty? I don't know what these lines encode exactly.

Thanks

gevro commented 7 months ago

Also one more potential issue. I saved md5sums of all the var outputs before the bug fix, and I'm comparing to the md5sums of those outputs after the bug fix. The md5sums of all the cov.bed.gz files and all the .var files are different after the bug fix. But I would have expected a change only for the problematic last chunk #32 files, no?

fa8sanger commented 7 months ago

Thanks very much for running the test. Could you share the 32.var file? It would help me understand In principle the fix I suggested should only avoid printing to cov.bed.gz when cov=0, so I don’t understand why you are getting different md5s (I must be missing something)

On 7 Feb 2024, at 21:09, gevro @.***> wrote:

Also one more potential issue. I saved md5sums on all the var outputs before the bug fix, and comparing to after the bug fix. The md5sums of all the cov.bed.gz files and all the .var files are different after the bug fix. But I would have expected a change only for the problematic last chunk #32 [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_32&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=h2vhV_hZkT95R6nIvDwW7Yq0oBpBN9ZKhUWzxzCEfQZvPChZb_0NL60y_6N8tx-G&s=VJF6BeFBmxq6vQnkkqfRUDCcEtSxTUzPbHu1TuMO7VE&e= files, no?

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_84-23issuecomment-2D1932912005&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=h2vhV_hZkT95R6nIvDwW7Yq0oBpBN9ZKhUWzxzCEfQZvPChZb_0NL60y_6N8tx-G&s=uNkwGC56YH0JcJnVEwoAhUD1nGzBg1RVQE6HYWx9aSI&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3IJ2QPU252BCLZUXITYSPUSFAVCNFSM6AAAAABB7BYKX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSHEYTEMBQGU&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=h2vhV_hZkT95R6nIvDwW7Yq0oBpBN9ZKhUWzxzCEfQZvPChZb_0NL60y_6N8tx-G&s=29tuZd4DZefCdESB7F3H2BIBiLv2scMkFeHeM-AgdeA&e=. You are receiving this because you commented.Message ID: @.***>

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA

fa8sanger commented 7 months ago

BTW, those burdens correspond to masked sites, so they don’t go to the cov.bed.gz

Here is a typical example: Burdens 0 0 38193241 Burdens 0 1 10 Burdens 1 0 350341 Burdens 1 1 4 Coverage 782808

The first 0 means “not masked”, the second whether or not is variant. Coverage in this internal file refers to number of positions of the genome with at least some coverage (masked or not masked)

On 7 Feb 2024, at 21:04, gevro @.***> wrote:

I've compiled and testing the bug fix (I changed the two lines in variantcaller.cc as you suggested), but I want to make sure it is an accurate fix.

I can see now that the 32.cov.bed.gz is empty. However, the 32.var has:

Does the final line indicate that there should be coverage in 32.cov.bed.gz? i.e. Coverage = 11060. So why is 32.cov.bed.gz now empty? I don't know what these lines encode exactly.

Thanks

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_84-23issuecomment-2D1932895200&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=S5QhqiMa3YbeaIlZhSRzmZaLUtS4FOtk4QU2ofAoMGQneeVMRE7W5Gy37GzktK2F&s=FQogz-LSpmB12XEK_BJuItIGXiCI-EoauedXW39MC3A&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3I3W3YT6NUSQFQDNGDYSPT5LAVCNFSM6AAAAABB7BYKX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSHA4TKMRQGA&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=S5QhqiMa3YbeaIlZhSRzmZaLUtS4FOtk4QU2ofAoMGQneeVMRE7W5Gy37GzktK2F&s=0_VfJDc0tviSTDXsA_e_1fWVXMaXT4aT26F3ojq0VmI&e=. You are receiving this because you commented.Message ID: @.***>

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA

gevro commented 7 months ago

Sure, I will email it offline.

gevro commented 7 months ago

Hi, I tested variantcaller with the two changes you suggested, this time making the changes on the v3.5.5 version of variantcaller.cc.

This confirmed that all var output files are identical except for 32.cov.bed.gz.

32.cov.bed.gz in the old version: $ zcat 32.cov.bed.gz 0 1 ;?;0

32.cov.bed.gz in the new version is empty. But note the file is still 20 bytes in size, not 0 bytes, because after bgzip, there is a minimum file size even for empty files. $ ls -lh 32.cov.bed.gz -rw-rw-r--. 20 Feb 11 10:51 32.cov.bed.gz

So I think this change is safe to make. But note that I tested it with v3.5.5 variantcaller.cc, not with the ‘develop’ version. So I can’t vouch for how these changes will behave in the ‘develop’ branch version of variantcaller.cc.

Thanks

fa8sanger commented 7 months ago

All good then? Alex, could you incorporate those changes?

On 11 Feb 2024, at 17:27, gevro @.***> wrote:

Hi, I tested variantcaller with the two changes you suggested, this time making the changes on the v3.5.5 version of variantcaller.cc.

This confirmed that all var output files are identical except for 32.cov.bed.gz.

32.cov.bed.gz in the old version: $ zcat 32.cov.bed.gz 0 1 ;?;0

32.cov.bed.gz in the new version is empty. But note the file is still 20 bytes in size, not 0 bytes, because after bgzip, there is a minimum file size even for empty files. $ ls -lh 32.cov.bed.gz -rw-rw-r--. 20 Feb 11 10:51 32.cov.bed.gz

So I think this change is safe to make. But note that I tested it with v3.5.5 variantcaller.cc, not with the ‘develop’ version. So I can’t vouch for how these changes will behave in the ‘develop’ branch version of variantcaller.cc.

Thanks

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_84-23issuecomment-2D1937816041&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=9sxz2cZvPjcoNJPjPo6QQfDZTqf2jag50c413WhwKy3-y7cZtrfbNJnyorpkOy3I&s=Ik7wCB7vWWHhMO9soVXuPWzxESG-xWdpuY-_teCNPc0&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3MFIV7L2WOZ64NMPQ3YTD5ONAVCNFSM6AAAAABB7BYKX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZXHAYTMMBUGE&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=9sxz2cZvPjcoNJPjPo6QQfDZTqf2jag50c413WhwKy3-y7cZtrfbNJnyorpkOy3I&s=wnxOG0YRRJIFrVZDVjO95GxRT5bFA380-XKnqXYEH4Y&e=. You are receiving this because you commented.Message ID: @.***>

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA

gevro commented 7 months ago

Yes all good as far as I can tell.

gevro commented 7 months ago

Also, is it possible to make just this change in the next version, without all the other pending 'develop' branch changes? Since I'm not sure / haven't tested how this fix interacts with all the other pending develop branch changes.

fa8sanger commented 5 months ago

Alex, would you be able to add these changes please? I had forgotten about this

gevro commented 2 months ago

Hi, I'm curious if this bug will be fixed in the next version?

fa8sanger commented 2 months ago

Hi, yes, and you already said you tested it successfully… Let me know if I am missing something

On 7 Jul 2024, at 16:45, gevro @.***> wrote:

Hi, I'm curious if this bug will be fixed in the next version?

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_84-23issuecomment-2D2212489993&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=7gHM4KFQadmStneRTems-NATLGh8Z_MpzbgIdmoeFBZDk0jPbksiSzmx34VUrAs_&s=pT4L6D60FG-nXGedAJPNWSidjxLfAso96OXjTnTq9zU&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3OSM3UDDXAHWPCGWTLZLFPB7AVCNFSM6AAAAABKPOXUHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJSGQ4DSOJZGM&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=7gHM4KFQadmStneRTems-NATLGh8Z_MpzbgIdmoeFBZDk0jPbksiSzmx34VUrAs_&s=24NWUYW4E9OGVrg6P_dRpKoikVuycvZ-_2uwDgM5bTU&e=. You are receiving this because you commented.Message ID: @.***>


The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA.

gevro commented 2 months ago

Yes, I tested it, but I didn't see it in the latest release 3.5.5. Is it updated in a dev branch that will be part of the next release?

fa8sanger commented 2 months ago

Sorry, you are right. I just went through the thread and there was no confirmation this had been incorporated, sorry

On 8 Jul 2024, at 13:02, gevro @.***> wrote:

Yes, I tested it, but I didn't see it in the latest release 3.5.5. Is it updated in a dev branch that will be part of the next release?

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_84-23issuecomment-2D2213834163&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=Ccntx8SrOu2DvMI17eYLQkdPZuHJtizgQBDK7c2sYQhf8XjktVn9cReNvyOiUTUO&s=RW2gAcYs04cFeaZUo2FavL7sivxOZN9YJznMGB0Z2nM&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3JCYQZORBKN4SPWV3LZLJ5WTAVCNFSM6AAAAABKPOXUHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJTHAZTIMJWGM&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=Ccntx8SrOu2DvMI17eYLQkdPZuHJtizgQBDK7c2sYQhf8XjktVn9cReNvyOiUTUO&s=VUSJ3GHbmm5_uF3uHEX564P1ZNnDhftlrP328-f9M-w&e=. You are receiving this because you commented.Message ID: @.***>


The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA.