gkamradt / ChunkViz

Visualize Different Text Splitting Methods
Other
199 stars 38 forks source link

Suggestions for new chunkers #4

Open do-me opened 3 months ago

do-me commented 3 months ago

Hey, super useful tool!

There's been some development in the chunking community. If you'd like to keep your app up to date here are a few suggestions. Also, considerung that all of the options struggle with correctly identifying sentence boundaries (quickly tested with some texts) and tend to chop off parts, it would be nice to have more choice.

Python

JS

Maybe another idea would be to include the option to allow for any regex like we did in SemanticFinder. I tried to come up with a good regex for sentence boundaries but it's incredibly hard.

gkamradt commented 2 months ago

Thank you for this - all of these sound good.

I haven't had time to improve the tool recently but I'd love help on it if you're up for it.

On Tue, Aug 27, 2024 at 5:59 AM Dominik Weckmüller @.***> wrote:

Hey, super useful tool!

There's been some development in the chunking community. If you'd like to keep your app up to date here are a few suggestions. Also, considerung that all of the options struggle with correctly identifying sentence boundaries (quickly tested with some texts) and tend to chop off parts, it would be nice to have more choice.

Python

JS

Maybe another idea would be to include the option to allow for any regex like we did in SemanticFinder https://github.com/do-me/SemanticFinder. I tried to come up with a good regex for sentence boundaries but it's incredibly hard.

— Reply to this email directly, view it on GitHub https://github.com/gkamradt/ChunkViz/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK22PJJ34IQJBGCFMFOSX3ZTRZZPAVCNFSM6AAAAABNGCUJSGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4DSMRZGIZTMOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Greg Kamradt Twitter https://twitter.com/GregKamradt, LinkedIn https://www.linkedin.com/in/gregkamradt/