Closed varunneal closed 1 year ago
Here's the live site using github pages: https://varunnsrivastava.github.io/SemanticFinder/
This is so awesome, great work! Love the sidebar with the clickable results and it's even mobile-friendly 🎉 The threshold feature is fantastic to play around with!
Will merge it soon - first, I want to set up GitHub pages for the repo too (facilitates the workflow and deployment) and remove the site from my personal homepage.
Some very minor (mostly CSS-related) things:
primary
and secondary
buttons. Submit should be primary (the dark blue color nudging the user to click it) while the prev and next buttons should be secondary (just a blue outline). This guides the user better. If you find a spare minute, feel free to modify any of the bullet points.
Update: Merged meanwhile.
Thanks for looking at my code! I'll take a look at these bullet points. My broader goal, which I've spent a few hours trying already, is to do automatic semantic segmentation in browser. That is, the parsing is completely automatic, with highlights based on textually relevant phrases. For that, "# chars" should be able to be dropped as a controllable parameter.
I was finding a floating "Results" title ugly but feel free to add it. There should already be a bootstrap column where it can go.
Fantastic!
For the "automatic segmentation" part I have a few links - but unfortunately no definitive answer. It's somewhat dependent on what you're looking for, how long your input text is, how much time you have and what you're aim is (find keywords or paragraphs?).
There is e.g. langchain JS with RecursiveCharacterTextSplitter that could come in handy.Â
Else, just to compare how other communities deal with it, there is haystack in Python.Â
It pretty much boils down to finding some kind of boundary (paragraph, sentence, word, character) or if the text split by the previous boundary still exceeds the input length for the model, the next finer boundary is chosen to reduce the segment length. Is that what you had in mind?
I really like the idea of automizing these complex things to make it easier for laypeople. Maybe it would be nice to have "auto-mode" and "advanced settings" where you can fine tune all the parameters.
I might end up using an approach similar to RecursiveCharacterTextSplitter but the broader point of Semantic Segmentation is to specifically highlight/identify semantically relevant phrases. As an example, the current implementation selects.
So Hansel and Grethel sat by the fire, and at noon they each ate their pieces of bread. They thought their father was in the wood all the time,
Splitting the text by sentences might give us
So Hansel and Grethel sat by the fire, and at noon they each ate their pieces of bread. They thought their father was in the wood all the time,
which is an improvement. Splitting by commas, e.g. recursively, could get us the desired
So Hansel and Grethel sat by the fire, and at noon they each ate their pieces of bread. They thought their father was in the wood all the time,
The basic algorithm I'm considering is:
No need to merge if unwanted. Maintained to style goal of sticking to one index.html file.