plurality of formats - Githubissues

Hi Adrian.. Thanks your 192-foil Best Hits a repast from reality for an hour or so..

Pedantic issues & Impressions: Is this big enough BigData? training an LLM I thought requires many Gigs of fodder.. the many different formats also might confound; ...unless there's an LLM with polyglot format conversion capabilities at par or better than SearchIt of yore.. beyond just extracting text from PPTs and PDFs, the visually aphoristic style of presos, the missing duality of dialogue seems confounding of 'sensible' machine interpretations.

endless non-auto-wrapped lines in .txt of Medium posts make HiTL browsing of .txt files rather too challenging.

I downloaded pdfs, and .pptx's to view them, which github apparently doesn't grok - Guessing one could view the .txt files in a browser or autowrap in textedit.

all as far as 'issues'; Hope you are well.

  This adventure motivated me also to sign up for Mastadon.. lots of cats.

Interesting I harbor a similar objective, training -something- on thousands of handwritten 4x6 notecards. The less-than chatBot objective: simple OCR of my own handwriting and fulltext search/retrieval ; maybe a tag Cloud Zettelkasten.

Hundreds of topics I'l like to bounce off you :^) guess those will have to wait

adrianco / meGPT

plurality of formats #14