Kazadhum / file_intro_plugin

0 stars 0 forks source link

Further suggestions #1

Open Kazadhum opened 1 year ago

Kazadhum commented 1 year ago

Hello @miguelriemoliveira, @rarrais and @MisterOwlPT!

I've got a working plugin now, and I was wondering if you had any suggestions. I wrote the plugin to be used for anything, so I don't mention ATOM or calibration specifically. The way I have it now, the plugin assumes the results file is a .csv file (should I expand this to other file formats?). The Rigelfile looks like this.

As you can see, it receives the row and column of the value to be used to evaluate the results, as well as thresholds for the various levels of quality. Finally, it receives a required level of quality, or how good the results need to be to 'pass' the test.

What I wanted to ask is if this seems right to you and, furthermore, if you have any suggestions.

miguelriemoliveira commented 1 year ago

Hi @Kazadhum ,

looks good from my side.

rarrais commented 1 year ago

Hi @Kazadhum , a suggestion from my side: instead of only allowing the user to check for a given row and column, we should have an additional mode that let the analysis be made by looking at the last row of a given column (or even the last column of a given row).

Let me illustrate this need with a conceptual example: let's imagine a process runs for a given time and keeps appending new data to new rows of a CSV file. In that case, the number of final columns/rows might not be known in advance, and the analysis would check if the last value was above/below a given user-defined threshold.

Another interesting feature would be iterating over a given column/row and checking if any value is above/below a given threshold. This might be useful for checking for errors during execution (and not only looking at the last value).

What do you think?

Kazadhum commented 1 year ago

Those sound like good suggestions! I'll try implementing those right away! Thank you

Kazadhum commented 1 year ago

Hi @Kazadhum , a suggestion from my side: instead of only allowing the user to check for a given row and column, we should have an additional mode that let the analysis be made by looking at the last row of a given column (or even the last column of a given row).

Let me illustrate this need with a conceptual example: let's imagine a process runs for a given time and keeps appending new data to new rows of a CSV file. In that case, the number of final columns/rows might not be known in advance, and the analysis would check if the last value was above/below a given user-defined threshold.

This has been implemented now, in the form of two fields in the Rigelfile

Kazadhum commented 1 year ago

Another interesting feature would be iterating over a given column/row and checking if any value is above/below a given threshold. This might be useful for checking for errors during execution (and not only looking at the last value).

@rarrais I've implemented this for columns. You specify a list of columns to verify, a boolean that determines if the verification is done or not, the threshold value and the comparison operator (maybe in some use cases the user doesn't want the values to be lower than the threshold!)

Given the nature of csv files, does it make sense to also do this for rows?

rarrais commented 1 year ago

Good work!

Let me propose an additional suggestion file looking at the Rigelfile you linked: one of the philosophies of Rigel is to reduce the complexity and verbosity of the Rigelfile for users. Could you attempt to minimize the number of fields while keeping the same level of functionality? Let me illustrate it with an example:

  # Value verification
  verify_columns: True
  columns_to_verify:
    - "RMS (pix)"

Would it be possible to remove the verify_columns variable and inducing that the user wants to verify columns by looking if he/she specified any value under the columns_to_verify variable? This is just an example, there are other simplifications that I'm confident we can achieve.

Given the nature of csv files, does it make sense to also do this for rows?

Maybe not.

Kazadhum commented 1 year ago

Made a small change for simplifying the Rigelfile. The "comparison operator" field was messy, confusing and unnecessary, so I changed it. Now the Rigelfile has two optional fields for the thresholds:

Doing this simplified both the plugin code and the Rigelfile. if the columns_to_verify field is not empty, at least one of these is needed to do the verification. Critically, this now allows for both a lower and upper threshold to be used simultaneously.

Note: if no threshold is specified, the verification does not occur but the plugin issues a warning to the user.