biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
558 stars 104 forks source link

FASTA indexing and slicing functionalities #387

Closed NickRoz1 closed 4 years ago

NickRoz1 commented 5 years ago

Hello!

Using awesome emi80's contributions, I've included FASTA indexing and slicing functionalities to Sambamba. As BioD submodule was detached since 28ed6e382e commit, merging this pull request will also require either updating BioD version, or manually including fasta.d and fai.d into sambamba/BioD/bio/std/file

george-githinji commented 5 years ago

just a note, @emi80 contribution has already been merged in BioD master branch.

pjotrp commented 5 years ago

Yes, sorry, we are still organising BioD and synchronising with sambamba. Should fix soon (I am building ldc 1.14 first). We should synchronize after.

pjotrp commented 5 years ago

@NickRoz1 great work, btw!

ycl6 commented 4 years ago

Hi, just wondering if the FASTA indexing functionality will be added in the near future?

pjotrp commented 4 years ago

We should merge it. @NickRoz1 or @emi80 do you want to add a few test files? Without tests it is probably not a great idea to merge. I think there are some tests in BioD. I can have a look.

NickRoz1 commented 4 years ago

For slice.d - I think it would be better to edit BioD fastaRegions function to make it possible to specify FASTA index file separately, e.g.:

auto fastaRegions(string filename, string pathToIndex, string[] queries) {
    File f = File(filename);
    FaiRecord[string] index = makeIndex(readFai(pathToIndex));
    Region[] regions = to!(Region[])(queries);
    return fetchFastaRegions(f, index, regions);
}

Instead of:

auto fastaRegions(string filename, string[] queries) {
    File f = File(filename);
    FaiRecord[string] index = makeIndex(readFai(filename~=".fai"));
    Region[] regions = to!(Region[])(queries);
    return fetchFastaRegions(f, index, regions);
}

I think there are some tests in BioD. I can have a look.

There are tests for underlying FASTA parsers, for example: https://github.com/biod/BioD/blob/7969eb0a847b05874e83ffddead26e193ece8101/bio/std/file/fai.d#L133

pjotrp commented 4 years ago

Thank you @NickRoz1 !