BHoM / Revit_Toolkit

A set of tools enabling exchange of information between BHoM and Revit.
GNU Lesser General Public License v3.0
27 stars 13 forks source link

Revit_Toolkit: remove whitespaces when matching names on Push #574

Open pawelbaran opened 4 years ago

pawelbaran commented 4 years ago

Description:

Atm the BHoM properties are being matched with Revit types based on names - sometimes this does not work due to whitespaces on either side (e.g. HEB200 vs HEB 200). Would be good to ignore these.

pawelbaran commented 4 years ago

After a quick glance, this looks like a more convoluted problem than expected. Therefore pushing to 3.2, to be resolved together with #582

pawelbaran commented 3 years ago

I could resolve it now already, but I started thinking of some more intelligent name matching than simply removal of whitespaces: it would be great if HEB200 could match not only with HEB 200 but also HE200B etc. I am curious if it would not be worth implementing a more intelligent string matching mechanism in general in order to handle typos or minor mismatches - this could be useful in other toolkits, but also e.g. in the method search.

What do you think @al-fisher @IsakNaslundBh @FraserGreenroyd?

FraserGreenroyd commented 3 years ago

At what point is HEB200 == HE200B? That's more than a mismatch to me, and should not be the responsibility of code to fix. The problem we have is if the toolkit looks at something and thinks "Oh, I can fix that", and does so, the user risks getting a result they didn't intend, but everything looks to be working fine.

I would agree with removing spaces at most. But I wouldn't agree with any other changes to the string - I would rather error out to the user and let them fix it to make sure they're getting the right workflow, and not the workflow we think might be right.

pawelbaran commented 3 years ago

HEB200 and HE200B are 2 commonly used names for the same thing. My gut feeling is we can find many more of such.

al-fisher commented 3 years ago

Yes agreed this is worth putting some thought into generalising. I think the key here is to not "hard code" the matching assumptions, as for given work flows it will be really valuable to be able to override, add to or customise - what matches to what.

I think we'll ultimately need a specific option for string comparing that allows user input. Not dissimilar to wildcard and regex work @alelom has recently been looking at for the file adapter as aswells as the configs for diffing work.

In fact think this is effectively a Comparer Config specific to String comparison. @alelom @pawelbaran
This would allow perhaps simple things like "ignore whitespace" as well as "allow character permutations" As well as more complex look ups such as typos/alternate spellings and synonyms in the future - based perhaps on datasets that the user can replace etc.

We can then create very simple standard configs (combinations of settings) and/or datasets of common strings that are equivalent - to help the most common workflows

pawelbaran commented 3 years ago

This sounds like a Milestone workshop to me, to get others' thoughts too?

FraserGreenroyd commented 3 years ago

This sounds like a Milestone workshop to me, to get others' thoughts too?

Agreed

al-fisher commented 3 years ago

Sounds good

IsakNaslundBh commented 3 years ago

Agree with all the above. Also links in to https://github.com/BHoM/BHoM_Datasets/issues/60 which is another place for the exact same issue of sections having slightly different names in slightly different context.

There I had some idea of some hard-coded alternatives stored on the sections, but if we can fix it with some more cleaver string comparison matching, that would be even better.

vietle-bh commented 8 months ago

This issue seems relevant to the recent discussion on fuzzy string matching!

https://github.com/BHoM/AGS_Toolkit/blob/ebcc28ff5232fcddff0380e939b45130a54feec2/AGS_Engine/Compute/Ratios/FuzzyMatching.cs#L49

pawelbaran commented 8 months ago

This issue seems relevant to the recent discussion on fuzzy string matching!

https://github.com/BHoM/AGS_Toolkit/blob/ebcc28ff5232fcddff0380e939b45130a54feec2/AGS_Engine/Compute/Ratios/FuzzyMatching.cs#L49

Love it, thanks @vietle-bh!