BHoM / Excel_UI

GNU Lesser General Public License v3.0
5 stars 4 forks source link

Excel Worksheet Name Limitations #345

Closed travispotterBH closed 2 years ago

travispotterBH commented 2 years ago

Issues addressed by this PR

Closes #344

Excel Adapter now alerts the user by recording an error when the name of a data table that is intended to be exported to excel is too long to be used as the name of a worksheet. This is favorable to automatically renaming the worksheet to something that conforms to the 31 character limit since that would be an unexpected behaviour from to view of the user .

Test files

Changelog

Additional comments

travispotterBH commented 2 years ago

@BHoMBot check required

bhombot-ci[bot] commented 2 years ago
@travispotterBH to confirm, the following checks are now queued: - `code-compliance` - `documentation-compliance` - `project-compliance` - `core` - `null-handling` - `serialisation` - `versioning` - `installer` There are 770 requests in the queue ahead of you.
travispotterBH commented 2 years ago

@FraserGreenroyd

I created a new partial class called Validation which include Validation.WorksheetName. Here I added an algorithm that checks the worksheet name against a set of rules and offers an alternate name if the name does not comply. The alternate name is simply "BHoM_Export_ddMMYY_HHmmss". I am open to suggestions on the alternate name. I figured this was a good alternate since we do not know what type of tables that may be used to create worksheets. They may not always be logs.

I am unsure if Excel.Adapter is an acceptable place for the Validation partial class, and am open to comment.

https://github.com/BHoM/Excel_Toolkit/blob/7507fdf3858cd9d258b54d90d39c01a5ae9810d0/Excel_Adapter/Validation/WorksheetName.cs#L28-L100

travispotterBH commented 2 years ago

@FraserGreenroyd Overall agreed.

Though, in the spirit of:

Thus I would go for modifying the provided name to shorten it and make it legal instead, this would result in a worksheet name that is closer to the original and provide some meaning to the worksheet.

I do want to makes sure that we are adding as much robustness as possible and capturing all the possible ways a user could give an invalid name. image So I don't think it is a simple as iterating to trim down to length.

I think that I should extend, rather than get rid of ModifyWorksheetName() so that it rids the string of each illegal element, while still conforming to your suggestion of a descriptive title.

Example:

'PressureCalc : Export/Log [Excel]' would be a very illegal name. It fails the length check, valid characters checks, ending/begining with " ' " check, etc.

So I would create an algorithm that first removes illegal characters, then run checks again. Then if it is still too long, then truncates the string to fit length limit.

I also think it is worthwhile to check the workbook for any other worksheet that might be named the exact same to avoid that issue, and offer a suggestion (probably increment at the end).

FraserGreenroyd commented 2 years ago

@travispotterBH yes agreed, I was happy with the methods which confirmed a name wasn't history and didn't contain illegal characters.

I'm happy with any order of how we change the string (whether we remove blanks first or something else - arguably it is better to remove illegal characters first as they'll have to be removed regardless), I'd just prefer to see it done in a workflow where the string is returned at the point it becomes legal rather than a hard/fast rule of legal or not legal going to the modify method which I feel is a bit clunky in terms of programming workflow.

That said we are getting closer to putting out the source of the fire, and would support a validate method which will check all of the issues you highlight in that screenshot 👍

travispotterBH commented 2 years ago

@FraserGreenroyd Please check the most recent commit to this. I think I've covered all the cases, and conform to the naming suggestions that you noted.

Essentially I check the name for validity and temporarily log the type of invalidity. Then under ModifyWorksheetName() I check the first invalidity and run a method to resolve that invalidity. Then the IsValidName method is run again to see if the modification resolved the worksheet name to a valid name. If not, the loop continues until it is.

I've set it to check in the following order:

  1. Uniqueness
  2. Null or Whitespace
  3. Reserved Words
  4. Invalid Characters
  5. Begin/End with invalid characters (important distinction being that " ' " is not invalid unless in the first or last position of the name.
  6. Character Limit

After each check, if invalid, the invalidity is logged, so the modifications to the worksheet will also happen in the same order or the logged invalidity.

Modification attempts: For each check, the modification that is attempted is as follows:

  1. Uniqueness Prepend timestamp (important to prepend since later on if we need to trim the string for character limit we do so by dropping the ending.)

  2. Null or Whitespace If null or blank, give generalized name of "BHoMExport" + timestamp

  3. Reserved Words if is a reserved word, , give generalized name of "BHoMExport" + timestamp

  4. Invalid Characters remove invalid characters

  5. Begin/End with invalid characters (important distinction being that " ' " is not invalid unless in the first or last position of the name. remove invalid characters

  6. Character Limit trim string to match character limits

Given that the validity check is done after each modification, if the modification then causes a different invalid state (such as prepending a timestamp which may increase the length beyond the character limit), the process continues until fully resolved to a unique name.

Example: The invalid name I noted above: 'PressureCalc : Export/Log [Excel]' would be a very illegal name.

resolves to: image

After a few runs it reached a point where it was legal to excel, except that it was not unique and the prepended timestamp took over.

I'll still want to test a bit more tomorrow to verify there are no bugs with this, but feel confident about the overall design of the algorithm.

travispotterBH commented 2 years ago

This is starting to look good now @travispotterBH , nice work 😄

Thanks :) :)

A key part is the use of the out keyword to obtain additional outputs from a method. On the one hand, we could use C#7 and use multiple return values from a method, but we don't currently support the use of C#7 in the code base (that's an ongoing discussion currently). So the other way to handle it also allows us to keep track of only one variable rather than 2 in the first place. For this, if our IsValidName method returns a List<ValidationError> objects, then we can infer whether the name is valid by whether or not the count of the list is 0 or not. If validationIssues.Count == 0 then we can infer we have no validation issues, and the name is thus valid. The while loop can then become a while(validationIssues.Count > 0) to achieve the same effect as you're proposing currently, but now we don't need to keep track of the bool variable as well. That said, using a list doesn't make much sense given we only work with the first item when modifying the string. So we either need to change the modification to work on all the options for invalidity, or remove the list and use a single object that returns null when everything is valid instead.

For the reserved words solution, I think it would be sufficient to simply prepend BHoM to the start rather than renaming the sheet completely. Someone wanted the reserved word as the original sheet name for a reason, so we should aim to accommodate that as much as possible within the constraints of Excel. In this case, History is reserved but BHoM History isn't from my understanding of the screenshot, so changing to BHoM History is slightly more beneficial than BHoM_Export_290622102534 for example.

Makes since! I updated to get rid of the bools and the list. I now use the switch statements to return the enum value for the issue directly. This eliminates both the out and the list, as will cleans up the code a bit. Now the trigger on the while loop is based on the enum value, and if the issue != WorksheetValidation.Valid then the code will continue until it is.

https://github.com/BHoM/Excel_Toolkit/blob/8ef28092835b04aae1236c625cbedbe6cc950b70/Excel_Adapter/Validation/WorksheetName.cs#L60-L82

https://github.com/BHoM/Excel_Toolkit/blob/8ef28092835b04aae1236c625cbedbe6cc950b70/Excel_Adapter/Validation/WorksheetName.cs#L40-L45

FraserGreenroyd commented 2 years ago

@BHoMBot check compliance

bhombot-ci[bot] commented 2 years ago
@FraserGreenroyd to confirm, the following checks are now queued: - `code-compliance` - `documentation-compliance` - `project-compliance` - `branch-compliance` - `dataset-compliance` - `copyright-compliance` There are 3 requests in the queue ahead of you.
FraserGreenroyd commented 2 years ago

Compliance looks happy @travispotterBH so just the final comments and this should be good from the code-review perspective 👍

FraserGreenroyd commented 2 years ago

@BHoMBot check required

bhombot-ci[bot] commented 2 years ago
@FraserGreenroyd to confirm, the following checks are now queued: - `code-compliance` - `documentation-compliance` - `project-compliance` - `core` - `null-handling` - `serialisation` - `versioning` - `installer`
travispotterBH commented 2 years ago

@BHoMBot check dataset-compliance @BHoMBot check ready-to-merge

bhombot-ci[bot] commented 2 years ago
@travispotterBH to confirm, the following checks are now queued: - `dataset-compliance` - `ready-to-merge` There are 9 requests in the queue ahead of you.
travispotterBH commented 2 years ago

@BHoMBot check copyright-compliance

bhombot-ci[bot] commented 2 years ago
@travispotterBH to confirm, the following checks are now queued: - `copyright-compliance`
FraserGreenroyd commented 2 years ago

@BHoMBot check ready-to-merge

bhombot-ci[bot] commented 2 years ago
@FraserGreenroyd to confirm, the following checks are now queued: - `ready-to-merge`