List of admissible users

gpetho commented 2 months ago

@laczkol @haalasz @CsongorFreytag

We're at the point in the application where the validated output csv file needs to be saved to the disk and uploaded to the destination directory. When the file is uploaded, a notification email is sent to an email address.

I suggest that the csv file should be named according to the following format: {user name}_{YY-MM-DD}_{HH:MM:SS}.csv, so name of the user who is uploading the file, followed by the current date and time at which the csv file is being generated and uploaded to the destination directory.

The user will enter their user name in the respective field in the UI of the browser application, the application will check whether this user name is included in the list of admissible user names, output an error message ("unknown user") and refuse the upload if the specified user name is not included in the list of admissible users, and add the user name to the file name and proceed with the upload if it is admissible. I think this is the right way to do it. It makes it clear who the data in the csv is coming from and it also introduces a weak form of user authentication as a bonus.

The question is what list of user names should be used for this. There are two obvious options: using either the GitHub or the Keybase identifier column of the list of colleagues here: https://github.com/DEpt-metagenom/CHAT

Should we use either of these, and if so, which one, or a completely different list?

laczkol commented 2 months ago

I'm glad to hear that.

I suggest that the csv file should be named according to the following format: {user name}{YY-MM-DD}{HH:MM:SS}.csv, so name of the user who is uploading the file, followed by the current date and time at which the csv file is being generated and uploaded to the destination directory. \ The user will enter their user name in the respective field in the UI of the browser application, the application will check whether this user name is included in the list of admissible user names, output an error message ("unknown user") and refuse the upload if the specified user name is not included in the list of admissible users, and add the user name to the file name and proceed with the upload if it is admissible. I think this is the right way to do it. It makes it clear who the data in the csv is coming from and it also introduces a weak form of user authentication as a bonus.

Good idea, I support this.

The question is what list of user names should be used for this. There are two obvious options: using either the GitHub or the Keybase identifier column of the list of colleagues here: DEpt-metagenom/CHAT

@CsongorFreytag @haalasz can you imagine these submissions being mentioned in our GitHub repositories, e.g. in project and quality management issues or sequencing run logging issues? If this is the case and corrections are requested in the metadata, it might be more practical to use the GitHub username to name the submitted files, as it would then be very easy to tag the submitter of the metadata. I'm not in favor of using Keybase usernames instead of GitHub usernames, but I'm also not against it if you think it's better for any reason. When we register new users on one of our servers, we use the new user's last name and the first letter of the first name to create the user ID. I see this naming strategy as an alternative that is also easy to follow.

CsongorFreytag commented 2 months ago

Should we use either of these, and if so, which one, or a completely different list?

Yes you can. Use GitHub username, and if is possible allow to save csv without upload.

gpetho commented 2 months ago

if is possible allow to save csv without upload.

Yes, the file is saved to the local machine first, before it is copied over to the destination directory on the remote machine (which will be ZFS) anyway. We can retain it on the local machine that the app is running on (which will be gpu-2 on the HUN-REN Cloud) or delete it right away after it has been copied to ZFS. Sure, we can add a switch to allow the user to specify whether the csv should be uploaded to ZFS or not after it has been created, but just out of curiosity, what would be the point of doing this? I don't see how just saving it to the local machine without uploading it to ZFS could be useful, since nobody uses that machine except me. The point of the upload step is exactly to make it easier for you and Zsolt to access the output file, so you don't have to log in to the machine that the app is running on and have to copy it over yourself.

CsongorFreytag commented 2 months ago

I don't know the UX flow, but it is useful if the user name is not entered or is entered incorrectly, so that the table that has already been filled in is not lost, there is less frustration for user, I think.

gpetho commented 2 months ago

I see. Saving the table locally is not the right solution for this problem in my opinion, but this is a good point that I have not thought about. What we will do is force the user to enter their user name and verify it before the user opens their input Excel or csv file that they want to validate or before the user can enter a new record (table row) to be validated manually, instead of demanding this before the file is saved. This guarantees that work is never lost.

gpetho commented 2 months ago

@iwmstjp I have updated app.py, I have added the list of hashed user names to it. I have added a function called check_user() which returns True if the user name entered by the user is in the column of GitHub user names in the list of coworkers here: https://github.com/DEpt-metagenom/CHAT It returns False otherwise.

Implement the feature described in the previous comment: When the user tries to either "Import file" or add a new empty row to the table (the button for this is not implemented yet in the version of the application that is covered by the pull request I accepted on Tuesday; I assume you have implemented it since then), then the application should call this function on the (stripped) content of the user name field and output an error message if no user name was specified and a different error message when a user name was specified but is not in the list of known users. In the latter case, the user should contact the database admin.

The deadline for this is today 2 pm, the time we start the usual daily standup meeting. This check should be very easy to implement.

gpetho commented 2 months ago

@haalasz Please join us in this meeting: https://meet.google.com/ptf-aupp-dfu

gpetho commented 2 months ago

I suggest that the csv file should be named according to the following format: {user name}_{YY-MM-DD}_{HH:MM:SS}.csv, so name of the user who is uploading the file, followed by the current date and time at which the csv file is being generated and uploaded to the destination directory.

As Masato very correctly noticed, I made a mistake here, the colon character cannot appear in the file name, so instead of HH:MM:SS the correct format is HH-MM-SS.

iwmstjp commented 2 months ago

This issue was resolved by the following commits: feature: add user check by hash fix: chech user_name before import files and input forms fix: change file name format because : in a file path somtime makes errors feature: add user_name and date to a saving file

DEpt-metagenom / MetagenoMongo

List of admissible users #15