ansys / pyfluent

Pythonic interface to Ansys Fluent
https://fluent.docs.pyansys.com
MIT License
277 stars 41 forks source link

Search module improvements #3385

Open mkundu1 opened 1 month ago

mkundu1 commented 1 month ago

Couple of proposed improvements for the search module from the chat:

Using the pyfluent.search function, how to search for exact settings API child names solver or settings? pyfluent.search("solver", match_whole_word=True) includes everything probably because it considers the prefix. pyfluent.search("settings", match_whole_word=True) includes results like .setup.models.discrete_phase.general_settings (Object)

I think the current behaviour for match_wholeword, where it splits by , is okay. We probably need a new control for a more api-focused search.

A search for "X" with match_whole_word=True means we match each node named "X", not each node that has "X" in its parent hierarchy. I haven't tested this or looked at the tests since Mainak's message so I don't know if I might be misinterpreting the issue. We should start by updating the tests to make sure that we are asserting all the required, correct behaviour. Then we can update the code to match with that.

mkundu1 commented 1 month ago

We should put an updated design first before making any code change.

Please also consider consolidating the duplicate data api_tree_251.pickle between api_tree/api_objects.json. This will reduce out package size. The legacy _search function can be removed if needed.

hpohekar commented 1 month ago

@seanpearsonuk @mkundu1

We can not remove _search function for following reasons.

  1. We do not support search_root feature in search.
  2. If we support search_root in search to be able to use it in flobject.py then we get circular import error therefore as per our last discussion we have kept it separately in _search only .

We have kept api_tree_251.pickle file as it is needed for _search. To improve search performance we have added api_tree/api_objects.json. I think we can have a single file to avoid duplication. Let's see.

hpohekar commented 1 month ago

Couple of proposed improvements for the search module from the chat:

Using the pyfluent.search function, how to search for exact settings API child names solver or settings? pyfluent.search("solver", match_whole_word=True) includes everything probably because it considers the prefix. pyfluent.search("settings", match_whole_word=True) includes results like .setup.models.discrete_phase.general_settings (Object) I think the current behaviour for match_wholeword, where it splits by , is okay. We probably need a new control for a more api-focused search.

A search for "X" with match_whole_word=True means we match each node named "X", not each node that has "X" in its parent hierarchy. I haven't tested this or looked at the tests since Mainak's message so I don't know if I might be misinterpreting the issue. We should start by updating the tests to make sure that we are asserting all the required, correct behaviour. Then we can update the code to match with that.

@mkundu1 @seanpearsonuk Could you please add test cases here so that we can update the code accordingly. Thank you.

mkundu1 commented 1 month ago

@hpohekar api_objects.json Above is the synsets section of the generated .json file. Few improvements we need there:

  1. Remove duplicate values from synset list.
  2. Remove empty synset lists.
  3. Remove the api word (for which we are looking for synsets) from the synset list.

After we compact this section, we should review the generated synsets. We probably need a stricter algorithm for generating synsets to avoid synonyms like mercantile_establishment.

mkundu1 commented 1 month ago

@hpohekar Please see #3393 for a test for the scenario mentioned by Sean.

mkundu1 commented 1 month ago

@hpohekar Looking at the top the level search function, we can process the keywords roughly in the following order:

def search(search_string: .., language: .., wildcard: .., match_whole_word: .., match_case: ..):
    <warn invalid keyword combinations>
    <get api_data>
    results = []
    if wildcard:
        results = _search_wildcard(search_string, api_data)
    elif match_whole_word:  # assuming match_case is a sub-option for match_whole_word
        results = _search_non_semantic(search_string, match_whole_word, match_case, api_data)
    else:
        results = _search_semantic(search_string, language, api_data)
    <print results>

This will avoid the expensive _search_semantic call when user has passed match_whole_word=True.

mkundu1 commented 1 month ago

Another issue:

>>> search("contours", match_whole_word=True) 
D:\ANSYSDev\PyFluentDev\pyfluent\src\ansys\fluent\core\search.py:650: UserWarning: ``match_whole_word=True`` matches the whole word (case insensitive).
  warnings.warn(
<solver_session>.setup.models.discrete_phase.injections["<name>"].physical_models.particle_drag.shape_factor (Parameter)
<solver_session>.results.graphics.contours (Object)
<solver_session>.results.graphics.contours.auto_scale (Parameter)
...
  1. The warning should not be there.
  2. The results include semantic search results. The first match is coming from semantic match of "contour" with "shape".