Dynamograph RoadMap - Githubissues

alberskib commented 10 years ago

19.05 : 08.06 - First iteration GO:

Steps:

[x] design of initial table layout for GO
[x] get to know scarph library
[x] design of (intial) scala model for GO
[x] investiage usefullnes of present GO parser
[x] usage of present parsers of write new one(custom) for GO
Artifacts:
[x] document describing initial table layout for GO
[x] scala model for GO (code)
[x] dynamodb code for creation GO tables(according to design) as well code for retrieving GO data from DynamoDB
[x] tests for code
[x] examplary GO data saved into DynamoDB
[x] parser for GO
09.06 : 29.06 - Second iteration ncbiTaxonomy:

Steps:
[x] improve things related to the GO
[x] design initial design table layout for ncbiTaxonomy - get to know data
[x] design initial design scala model for ncbiTaxonomy
[x] investiage usefullnes of present ncbiTaxonomy parser
[x] usage of present parsers of write new one(custom) for ncbiTaxonomy
Artifacts:
[ ] document describing table layout for ncbiTaxonomy
[x] scala model for ncbiTaxonomy (code)
[ ] dynamodb code for creation ncbiTaxonomy tables(according to design) as well for retrieving ncbiTaxonomy data from DynamoDB
[ ] tests for code
[ ] examplary ncbiTaxonomy data saved into DynamoDB
[x] parser for ncbiTaxonomy
30.06 : 13.07 - Third iteration RefSeq:

Steps:
[ ] design table layout for RefSeq - get to know data
[ ] figure out connection of s3 with dynamodb
[ ] design scala model for RefSeq that handles s3 queries as well as dynamodb
[ ] search for proper parser of RefSeq data or build custom solution
Artifacts:
[ ] document describing table layout for RefSeq and cooperation of s3 with dynamodb for RefSeq
[ ] scala model for RefSeq (code)
[ ] dynamodb code for creation RefSeq tables(according to design) cooperation with s3 and hadling special cases
[ ] tests for code
[ ] examplary RefSeq data saved into DynamoDB
[ ] parser for RefSeq
14.07 : 27.07 - Fourth iteration UniRef: // or futher work on steps/artifacts from previous iteration

Steps:
[ ] design table layout for UniRef - get to know data
[ ] design scala model for UniRef
[ ] investiage usefullnes of present UniRef parser
[ ] usage of present parsers of write new one(custom) for UniRef
Artifacts:
[ ] document describing table layout for UniRef
[ ] scala model for UniRef (code)
[ ] dynamodb code for creation UniRef tables(according to design) as well for retrieving UniRef data from DynamoDB
[ ] tests for code
[ ] examplary UniRef data saved into DynamoDB
[ ] parser for UniRef
28.07 : 10.08 - Fifth iteration:

Steps:
[ ] execute performance tests
[ ] introduce improvements
[ ] evaluate soutions with mentors
[ ] introduce suggestions after evaluation
[ ] preparation of isage examples
Artifacts:
[ ] report from performance tests
[ ] document describing evaluation of solution with places to improve
[ ] solution draft
[ ] documentation draft
[ ] examples showing how to use solution
11.08 : 18.08 - Final delivery/release

Steps:
[ ] prepare packages
[ ] scrub code, documentation
Artifcats:
[ ] project documentation
[ ] working solution
[ ] GO, ncbiTaxonomy UniProtKB and UniRef data stored in DynamoDB

Each iteraton also focus on code quality(includes refactoring etc).

alberskib commented 10 years ago

@bio4j/dynamograph Please take a look into Roadmap and express your opinion. Do you think that presented plan is reasonable I mean it contains too much aims or not enough? I added UniProtKB and UniRef as additional data types but if you think that there is some better data I will replace it.

laughedelic commented 10 years ago

@alberskib I just added checkboxes in the current period, so that the current progress is more visual, check please what is already more or less done.

eparejatobes commented 10 years ago

Looks good in general :)

I'd add aws resource management in general (create and destroy tables, autoscaling groups etc). About the dataset uniprot and all that is maybe too much, and I think that refseq could be more interesting, also for seeing how a mixed dynamo/s3 solution performs (refseq includes a lot of seq data with the need for range access).

alberskib commented 10 years ago

@eparejatobes By aws resource management you mean creation code that will provide such functionality or manually do such thing? Mixing dynamoDb with s3 seems extremely interesting so I definitiely will handle this dataset. I sligthly modify RoadMap.

alberskib commented 10 years ago

If you know any other datasets that should be handled please let me know (generally if you suggest modification of selected datasets).

eparejatobes commented 10 years ago

@alberskib I mean code of course, like what we talk about during our previous meeting. @evdokim can probably show you some examples

eparejatobes commented 10 years ago

We're taking the midterm evaluation as an opportunity for refining and updating this. Some comments about it:

We are deprecating RefSeq from Bio4j (this is still not announced, but it will be in the next few days), so it would make no sense to work with it here. The ENA would be the equivalent resource integrated.
we should start creating issues and milestones for the next steps, and keep a more detailed tracking of them

bio4j / dynamograph

Dynamograph RoadMap #12

19.05 : 08.06 - First iteration GO:

Steps:

Artifacts:

09.06 : 29.06 - Second iteration ncbiTaxonomy:

Steps:

Artifacts:

30.06 : 13.07 - Third iteration RefSeq:

Steps:

Artifacts:

14.07 : 27.07 - Fourth iteration UniRef: // or futher work on steps/artifacts from previous iteration

Steps:

Artifacts:

28.07 : 10.08 - Fifth iteration:

Steps:

Artifacts:

11.08 : 18.08 - Final delivery/release

Steps:

Artifcats: