biocommons / hackathon-2023

Hackathon 2023 projects and planning.
0 stars 0 forks source link

Update SeqRepo and automate future updates #5

Open korikuzma opened 1 year ago

korikuzma commented 1 year ago

Submitter Name

@reece

Submitter Affiliation

MyOme

Requested By

Everyone using SeqRepo

Lead(s)

@reece

biocommons Repo

seqrepo

Project Details

Hackathon Project Slide

SeqRepo data has not been updated since Jan 29, 2021. Instructions for updating SeqRepo is here.

The goals for this project are:

korikuzma commented 1 year ago

@reece @andreasprlic Would you be able to provide any additional information on this?

andreasprlic commented 1 year ago

the seqrepo update procedure is closely tied to the UTA update procedure. Can we merge this issue with #6 ?

korikuzma commented 1 year ago

@andreasprlic That’s fine with me. I know these are two issues that are highly wanted. @reece are you ok with this?

reece commented 1 year ago

@andreasprlic in what sense do you see these as "tied"?

They are pretty different in terms of mechanism, data sources, complexity, and reliability. With the exception of the tools that we might choose to automate the process, I don't think that lessons from one will inform the other.

So, I'd prefer to keep them separate. It's easier to compose from pieces than to disentangle a monolith.

andreasprlic commented 1 year ago

Currently we have a manual update procedure that includes steps for both UTA and seqrepo. When I saw this ticket to "automate" future update, I was under the impression the plan might be to wrap the steps of the manual procedure as a "workflow". As such an update of both UTA and seqrepo would be done together at the same time.

Perhaps, as we build a workflow for updating the content, we could design this so each of the steps for UTA and seqrepo could get run independently and as a separate (parallel?) process. Perhaps that would make it meet what you are expecting?

reece commented 1 year ago

Yes. Similar tooling for parallel workflows would be grand. Also, UTA really only depends on SeqRepo because it UTA needed to realign to get cigar strings way back before NCBI GFFs existed. With the GFFs, we don't need to realign.

korikuzma commented 1 year ago

@reece @larrybabb and I could co-lead if you are leading a separate project