Closed gazayas closed 1 year ago
I opened this as a WIP for a few reasons:
Okay, this one should be good to go. With the last commit, I automatically set up the field mappings with the values passed via the select inputs. I don't think we were actually doing this before (For example say you have an attribute called "Name". You can try to choose "Don't import" for a column against main
, but the attribute will still be mapped with "Name").
Besides that, system tests are passing, so I'll mark this one as ready!
@gazayas If we still need this PR can you resolve the conflict?
@jagthedrummer Sorry for the delay on this one, the merge conflict has been resolved.
It stemmed from https://github.com/bullet-train-pro/bullet_train-action_models/commit/a8ff0bcc7befd89866e41d4a8f163e55f94d62e1 where we put analyze_file
in a private method at the bottom of performs_import.rb
. The original OpenAI code I wrote in this PR was in that method, so I moved it to the proper place in the file.
The test failures seem to be do to a Google Chrome issue.
I got the Chrome issue fixed up in main
, then merged main
into this branch, and now everything is looking good. 👍
Closes #61.
Depends on:
Details
OpenAI uses embeddings to get string match data represented as numbers (-1 to 1).
OpenAI has different models available which we can run our data against, and you can run the following in the rails console to see which models are available:
I went with
text-similarity-babbage-001
, but there are other text similarity models available so we can change it if necessary.Vectors
In the embeddings link above, you can see the following code can be used to compare the data:
Ruby has the
Vector#dot
function available to achieve the same thing, so I used this to get the CSV column name, the closest match from the model, and the similarity score in theclosest_attribute_matches
method that I wrote:The docs suggest using consine similarity as the distance function, but they also say "The choice of distance function typically doesn’t matter much," so I don't think we have to worry about using anything besides
dot
.