38 / d4-format

The D4 Quantitative Data Format
MIT License
156 stars 20 forks source link

WIP: solidify start #63

Open brentp opened 2 years ago

brentp commented 2 years ago

d4 uses region syntax like: chr1:1-100, which is generally 1-based such that this region would translate to chr1\t0\t100 in BED format. The internals of d4 are unclear, but I think this change is a start.

@38 could you have a look at the changes in ssio/reader.rs, the break might not work if things are not sorted (though they appear to be). The change of the if statement at line 136/137 is related to the +/- 1 used internally within d4 (which I don't yet understand).

This change passes all tests and fixes the problems seen in #59

brentp commented 2 years ago

I think the code part of this is ready to merge (just had to find where bed intervals were converted to regions).

mrvollger commented 1 year ago

@arq5x @38 can this be merged and a new release cut to fix #59, or are there more steps that need to be taken first? Maybe I can help?

brentp commented 1 year ago

Hi @mrvollger , You could test this, but we really need Hao (@38) to have a look as I don't fully understand the datastructures used here.

mrvollger commented 1 year ago

Hi @brentp, I understand, and thanks for making a first pass! I tested your changes this morning and it fixes the issue I have been having with my files. For now I will use your source, and wait for final word from @38.

Thanks!

38 commented 1 year ago

Hi @brentp, just let you know I've fixed issue #59 by this commit https://github.com/38/d4-format/commit/000f9e630f70844e72f6afb30f056afa596381c8.

And the expected output was all zero values. I explained roughly what is going on under issue #59.

And I believe for this PR, you don't need to change the ssio/reader.rs implementation anymore. Just simply minus one should do the job.

Please let me know if you have any questions.

Thanks, Hao