MolSSI / covid

MolSSI SARS-CoV-2 Biomolecular Simulation Data and Algorithm Store
https://covid.molssi.org
28 stars 49 forks source link

Suggestions to ingest data automatically #69

Open jaimergp opened 4 years ago

jaimergp commented 4 years ago

@apayne97, @henriberger and I have been talking about solutions to incorporate information from the Thorne Lab in a more automated way. We have come with this "ideal" pipeline:

Tier 1) Create a script that can diff their PDB IDs with our PDB IDs. Report the set difference for a human to review which new ones are worth adding.

Tier 2) Create a GitHub Actions pipeline that does this automatically either with an hourly cronjob or, if technically possible, after every push to the Thorne Lab repo

Tier 3) Add bot features to GHA to submit the PRs needed for each new candidate PDB ID. A human reviews it, editing the information as needed, and merges or rejects it. The closed PRs serve as a history on what we have tried so we don't resubmit twice.

Let us know if you have feedback!

Lnaden commented 4 years ago

I like this idea. The first one would not be too hard to do. The second one I would want to be careful about due to the possibility of pinging people watching this repo every time it makes a PR, but could be done relatively easily. Same concern with the 3rd, but I don't think I see the difference between 2 and 3, could you elaborate?

jaimergp commented 4 years ago

Option (2) only notifies a selected pool of users, say by writing a comment on a specific issue.

Option (3) would create the adequate PRs (one per PDB id?), with an automatically generated file template filled by the new information upstream.

About the notification noise... I guess we can have a fork of this repo somewhere else where those branches are created and then it's up to the human(s) to create the PR or not? I am not really sure if I like that though... I am inclined to say I am not.

I don't know if there are API ways to selectively notify only some people, but if you are subscribed to this repo, you'll get everything anyway.