NHMDenmark / DaSSCo-Integration

This Repo will include integration of dassco storage from northtec
0 stars 0 forks source link

Write MoSCow for consuming the northtech API #1

Open bhsi-snm opened 1 year ago

bhsi-snm commented 1 year ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem statement Aim of the project

Describe the solution you'd like A clear and concise description of what you want to happen. What is considered as a successful project Include you MoSCow here

This is a template for categorizing requirements using the MoSCoW Technique

Copy and paste all your requirements under the requirements column |   Mark an X under the appropriate M,S, C or W column depending on stakeholder preference Focus on the requirements with the M Mark before proceeding with S, C and W in that order (time and resources permitting)

REQUIREMENTS | MUST (M) | SHOULD (S) | COULD (C) | WON'T (W) |   Feature   Feature   Feature   Support for ---   Support for ---   Compatability ---   Language Support

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. Alternative approach you considered

Additional context Add any other context or screenshots about the feature request here.

any other relevant information

Baeist commented 1 year ago

MoSCoW suggestions for integration server and consuming NorthTech(NT) api

REQUIREMENTS | MUST (M) | SHOULD (S) | COULD (C) | WON'T (W) |

(S) Correct api documentation. (NT)

(M) Automated authorization OAuth2.0 setup for server requests.

(M) Connecting with other servers (ssh).

(S) Automating ssh connections. No password protocols.

(S) Decide which OAuth grant type to be used. (Authorization Code Grant seems best.)

(C) Role checking. User, admin, developer etc. Internal only? External user calls? Is this handled better somewhere else?

(M) Running/setting up virtual machine.

(W) Reboot system routine.

(C) Internal server error handling. Call out for help with github notifications?

(W) Safety net for running processes. Persisting status.

(W) Maintenance routine check for files on server.

(W) Log system for server changes.

(S) Test calls to api, ndrive, dassco servers. Through rest api/ssh.

(M) Continually running python app for making api calls.

(M) New files ready check. Ask others? Check directly in directories? Can be event based or time based. both?

(M) Create metadata asset through api.

(S) Update metadata asset through api.

(M) Functions for open/close/reopen SAMBA connections.

(C) Tracking system for SAMBA connections.

(M) Check files have been transferred to erda (db). End storyline.

(S) Clean up concluded files.

(C) Understandable directory system. Documented.

(M) Event based triggers for app.

(S) Keep track of file status.

(W) Keep track of file statistics.

(M) Create storyline for files. Or other system to determine where each file should go before using NT API.

(S) Check file/data validity.

(M) File transfer to and from connected servers. SAMBA SFTP HTTP

(C) Json datafields check. Documentation data-field table update.

(C) Logging/updating status of assets.

(C) User friendly README.md for api usage.

(C) Developer friendly README.md for server functionalities and code base.

(W) Dashboard functions made as rest api instead for hosting somewhere else?

(W) Dashboard with file status.

(W) Dashboard with file statistics.

(W) Dashboard direct user access to api.

(W) Dashboard manual updates through api.

(W) Dashboard creating institution etc through api.

(W) Dashboard getting institution etc lists through api.

(W) Dashboard for auditing through api.

(W) Documentation for integration server. Flowchart, data flow diagram (main scenarios).

(W) Documentation for integration server python app. Class diagram, source code comments.

Baeist commented 1 year ago
REQUIREMENTS MUST (M) SHOULD (S) COULD (C) WON'T (W)
Correct API documentation. (NT)
(API auth)Automated authorization OAuth 2.0 setup for server requests.
(?)Connecting with other servers (SSH).
(which servers)Automating SSH connections. No password protocols.
Decide which OAuth grant type to be used.
(Keycloak - how to integrate )Role checking. User, admin, developer, etc. Internal only?
(for depployment)Running/setting up a virtual machine.
Reboot system routine.
Internal server error handling. Call out for help with GitHub notifications?
Safety net for running processes. Persisting status.
Maintenance routine check for files on the server.
Log system for server changes.
Test calls to API, Ndrive, Dassco servers. Through REST API/SSH.
(script)Continually running Python app for making API calls.
New files ready check. Ask others? Check directly in directories?
Create metadata asset through API.
Update metadata asset through API.
Functions for open/close/reopen SAMBA connections.
Tracking system for SAMBA connections.
Check files have been transferred to Erda (DB). End storyline.
Clean up concluded files.
Understandable directory system. Documented.
Event-based triggers for the app.
Keep track of file status.
Keep track of file statistics.
Create a storyline for files.
Check file/data validity.
File transfer to and from connected servers. SAMBA SFTP HTTP
JSON data fields check. Documentation data-field table update.
Logging/updating the status of assets.
User-friendly README.md for API usage.
Developer-friendly README.md for server functionalities and code base.
Dashboard functions made as REST API instead for hosting somewhere else?
Dashboard with file status.
Dashboard with file statistics.
Dashboard direct user access to the API.
Dashboard manual updates through the API.
Dashboard creating an institution, etc., through the API.
Dashboard getting institution, etc., lists through the API.
Dashboard for auditing through the API.
Documentation for integration server. Flowchart, data flow diagram (main scenarios).
Documentation for integration server Python app. Class diagram, source code comments.
Baeist commented 1 year ago
REQUIREMENTS MUST (M) SHOULD (S) COULD (C) WON'T (W) COMMENTS
Correct API documentation. (NT) We need to fully understand the API. This is NorthTechs job.
Automated authorization OAuth 2.0 setup for server requests. The process of setting up connections to the different parts of the full pipeline must be done securely and should be done automatically.
Connecting with other servers (SSH). Setting up a SSH connections will enable SFTP through "paramiko" library. Other solutions here could be setting up rest apis for transferring through http.
Automating SSH connections. No password protocols. This will make the connection easier to work with. I believe paramiko also has some of this build into it.
Decide which OAuth grant type to be used. For setting up how to protect access to the processes and data we have to choose a specific OAuth grant type. Currently leaning towards using the "Authorization Code" type.
Role checking. User, admin, developer, etc. Internal only? Who can do what with the files. Includes who can access the data and from where. Not certain this is a needed feature here. Potentially should be handled by the dassco storage webapp instead.
Running/setting up a virtual machine. A place for everything to run.
Reboot system routine. A way to safely close down everything and restart it. This should include checking that no data is lost. Potentially opening new connections after reboot. Checking that other services are available. Starting the main python client. Updating log.
Internal server error handling. Call out for help with GitHub notifications? A way to notify staff that something has gone wrong. This is not meant for individual data files but rather system/server wide errors.
Safety net for running processes. Persisting status. A way to keep track of individual files status throughout the processes needed. Potentially a way to reset a file to its original state before any processes were initiated.
Maintenance routine check for files on the server. Assuming that lots of files are in transit at any one point in time having a way to check that none of them are stuck could be good. Should include a way to alert staff to any issues.
Log system for server changes. Logging for server status including connection status, reboots and total file status.
Test calls to API, Ndrive, Dassco servers. Through REST API/SSH. Basically routine for checking connections are working and that other services are available for use.
Continually running Python app for making API calls. Processes would be running through a python app that should continually be available.
New files ready check. Ask others? Check directly in directories? Can be event-based or time-based. Both? How and when to receive files. This can be done in different ways. SSH/SFTP or HTTP seems best. Who contacts who has to be decided. Currently i think its best if contact is initiated from here. However setting up a rest api for others to initiate contact/transfer would also be fine. How often potentially asking for new data- could be depending on current running file processes or it could be time based.
Create metadata asset through API. Make use of NT API to create a metadata asset for the dassco storage and transfer images through smb connection.
Update metadata asset through API. Make use of NT API to update a metadata asset for the dassco storage.
Functions for open/close/reopen SAMBA connections. Make use of NT API to control samba connections. Should be event based.
Tracking system for SAMBA connections. Way to keep track of which samba connections exist and their status for each file.
Check files have been transferred to Erda (DB). End storyline. Protocol for insuring that image files have been correctly stashed in ERDA.
Clean up concluded files. Deletion of files once specify has received new information and ERDA has confirmed everything is correctly saved there.
Understandable directory system. Way to get an overview of where each file is in its process. Probably sorted by status. Example something could be in "awaiting_return_from_refinery" folder if we had a status denoting that.
Event-based triggers for the app. When a files status is updated something needs to happen.
Keep track of file status. Way to update and keep track of files individual status.
Keep track of file statistics. This would not be directly needed for however i still think having something like this would be nice. Would help in case of crashes or other system errors where recovery is involved.
Create a storyline for files. Or another system to determine where each file should go before using NT API. There has to be a way to determine what processes a file needs to go through before its "case" can be closed. Sort of a tree of statuses a file would have to travel through.
Check file/data validity. Check that expected and filled out datafields are correctly noted. Way to handle acceptable exceptions and way to handle unacceptable ones also.
File transfer to and from connected servers. SAMBA SFTP HTTP Transfer of files once connections and everything else is in place.
JSON data fields check. Documentation data-field table update. Dassco internal documentation for metadata fields should be updated.
Logging/updating the status of assets. Keeping a log of status changes and unfullfilled status changes to help in case something goes wrong.
User-friendly README.md for API usage. README for NT api usage that isnt just their documentation. Maybe not needed if their documentation gets updated.
Developer-friendly README.md for server functionalities and code base. README document describing how the integration server works and potentially how to extent or implement new features. Description of any scripts or timed processes the server is meant to be running.
Dashboard functions made as REST API instead for hosting somewhere else? Potentially it would be nice to have stats come out of this server to help maintain a general overview of everything in one single place. After further talks this would be done from the dassco service storage. So for all the dashboard entries it wont be implemented as a direct web app build on top of the integration server.
Dashboard with file status. see above
Dashboard with file statistics. see above
Dashboard direct user access to the API. see above
Dashboard manual updates through the API. see above
Dashboard creating an institution, etc., through the API. see above
Dashboard getting institution, etc., lists through the API. see above
Dashboard for auditing through the API. see above
Documentation for integration server. Flowchart, data flow diagram (main scenarios). Documentation in the form of diagrams. These exist in some form but not with the integration as a focus point.
Documentation for integration server Python app. Class diagram, source code comments. Code documentation for the python application. This is not really a "W" however the degree to how detailed this is done is.