do-me / SemanticFinder

SemanticFinder - frontend-only live semantic search with transformers.js
https://do-me.github.io/SemanticFinder/
MIT License
210 stars 14 forks source link

Use similarity js package #26

Closed lizozom closed 1 year ago

lizozom commented 1 year ago

I improved the similarityjs package based on @VarunNSrivastava 's work. This PR uses this package with an identical user experience. If we want to, we can then improve on it later!

This PR also contains a pre commit hook that runs a linter to make sure that our code is consistent.

varunneal commented 1 year ago

This looks very good! I see that there are two versions of split_text.js. Additionally the loading progress bar seems a little bit off. I'd like to fix the latter but I'm a bit unclear on how the progress callback is currently working:

    await init({
        modelName: 'Xenova/all-MiniLM-L6-v2',
    }, (progress) => {
        if (progress.status === 'done') {
            console.log(`Loaded ${progress.name} (${progress.file})`);
        }
    });
lizozom commented 1 year ago

The split_text file in the semanticjs package inside the demo folder, which is just for easier debugging. Is that what you meant?

As for the progress callback, it returns the same object it did before. In the package it's typed as LoadingProgress.

The rest of the logic I copied from your code, and it's handled with the rest of the UI in this project.:

The callback function is here. And it's used when loading the model.

Does this make sense?

varunneal commented 1 year ago

Thanks for pointing it out! It seems very well written. Would you want to integrate this library as part of this repo (with @do-me's agreement, of course)? For displaying the progress % on the loading bar, it's probably unnecessary to do it to two decimal places.

do-me commented 1 year ago

@VarunNSrivastava yes of course, from my perspective it would be nice to have everything under one roof! @lizozom what do you think?

lizozom commented 1 year ago

So you mean that this project should become installable using npm install semanticfinder?

Normally libraries don't really live in the same repo as their demos (especially if it's a demo as detailed as this one)

I think it makes more sense to split out the UI logic and the core embedding logic, as people interested in that code, don't really want to install the demo page a dependency.

do-me commented 1 year ago

How I understand that was kind of the idea. Maybe you're more familiar with npm, but couldn't we use some kind of flag for this, like "with or without GUI" in some way? Else, we could have two branches in the repo, the main branch with only the npm package, then the "GUI" branch (and the one for GitHub pages with dist files).

@VarunNSrivastava or how did you think to integrate the library in this repo?

@lizozom for me it's also totally fine if the core logic lives in a different repo, we could surely go ahead this way. The only thing that would be nice is if our affiliation @VarunNSrivastava 's and mine wouldn't be "lost" in semantic-js. Do you maybe have an idea how to deal with this?

varunneal commented 1 year ago

I was thinking along the line of the transformers.js repo having the demo site in the same library as the package. I imagine I'll be doing any future development of this project on this repo, so a merge would be to contain any divergence between the package and the demo.

do-me commented 1 year ago

I'll be closing this PR for the moment as we'd need to discuss some more (technical/organizational) details first. Having an installable npm package would be very nice and we can always re-open this discussion - I'd propose to head over to the actual "discussion" section for this.

@lizozom

Normally libraries don't really live in the same repo as their demos

I was reflecting about what SemanticFinder is and I think it certainly outgrew the "demo" stage. It has become:

I agree with @VarunNSrivastava that SemanticFinder could be organized like the transformers.js repo. However it's also my first time to work with this level of complexity on a monorepo so if you have proposals or other ideas they would be very welcome!