biocommons / hackathon-2023

Hackathon 2023 projects and planning.
0 stars 0 forks source link

Public seqrepo registry #8

Closed korikuzma closed 1 year ago

korikuzma commented 1 year ago

Submitter Name

Alex Wagner (@ahwagner)

Submitter Affiliation

Nationwide Children's Hospital

Requested By

Wagner Lab

Additional Submitter Details

No response

Lead(s)

LEAD NEEDED

biocommons Repo

seqrepo, seqrepo-rest-service

Project Details

A lab member has been inserting his own sequences for MaveDB. Implementing SeqCol allows for a standardized method to check if a collection of sequences used in a dataset (e.g. MaveDB variant maps using VRS) is supported by a seqrepo instance. This has natural implications for use of seqrepo as a federated service.

This would likely be implemented across both seqrepo (method to check sequence collection compatibility) and seqrepo-rest-service (implementing seqcol API spec).

Skill Level

Intermediate

Required Skills

Python

ahwagner commented 1 year ago

I think we should build into SeqRepo the ability to quickly report or check compatibility of sequences hosted by a seqrepo instance, which would be a precursor step towards realizing a registry of seqrepo instances.

reece commented 1 year ago

I think we should build into SeqRepo the ability to quickly report or check compatibility of sequences hosted by a seqrepo instance, which would be a precursor step towards realizing a registry of seqrepo instances.

Alex, can you please explain in more detail? What does it mean to "report or check compatibility of sequences"?

reece commented 1 year ago

As you said, creating a custom seqrepo is pretty easy. What are some specific goals for a registry? How would they be implemented (or is part of this project to figure that out)?

I'm looking for answers to questions like:

ahwagner commented 1 year ago

I would like to add support for the Sequence Collections specification, specifically to add support for retrieving and comparing sequence collections.

What I would like to see is a standardized interface for checking if a set of sequences (e.g., sequences collected from VRS objects in a resource) is supported by a seqrepo instance. This is part of the vision of allowing users to load custom sequences into seqrepo and use those sequences to report VRS variants, as we have done for our work with the Atlas for Variant Effects Alliance (AVE).

If that instance (or another instance containing those sequences) is part of a federated network of seqrepo instances, it can report that the sequence collection is retrievable, or notify the user what sequences are not retrievable, in a standardized way.

korikuzma commented 1 year ago

This will not be worked on at the hackathon. Can be worked on after the hackathon in a new issue in respective repo.