ThirdAILabs / Demos

Notebooks for ThirdAI demos
Other
70 stars 13 forks source link

Forks Stargazers Issues License


Logo

Demos (Deprecated)

This is the deprecated repository containing interactive notebooks for exploring the ThirdAI python library. Starting Sept 1st 2024, ThirdAI python package will no longer be supported. You can access all the functionality of the package and lot more on the new ThirdAI Platform

[Website] ยท [Report Issues] ยท [Careers]

๐Ÿ‘‹ Welcome

All of ThirdAI's technology is powered by its BOLT library. BOLT is a deep-learning framework that leverages sparsity to enable training and deploying very large scale deep learning models on any CPU. This demos repo will help get you familiar with our products Neural DB and Universal Deep Transformer (UDT) through interactive notebooks.

๐Ÿง  NeuralDB (for RAG and Search)

NeuralDB is an efficient, private, teachable CPU-only text retrieval engine. You can insert all your PDFs, DOCXs, CSVs (and even parse URLs) into a NeuralDB and do semantic search and QnA on them. Read our three part blog on why you need NeuralDB here. Leveraging over a decade of research in efficient neural network training, NeuralDB has been meticulously optimized to operate effectively on conventional CPUs, making it accessible to any standard desktop machine. Additionally, since it can be trained and used anywhere, NeuralDB gives you airgapped privacy, ensuring your data never leaves your local machine.

With the capacity to scale Retreival Augmented Generation (RAG) capabilities over thousands of pages, NeuralDB revolutionizes the way you interact with your data.

Here is a quick overview of how NeuralDB works:

from thirdai import neural_db as ndb

db = neural_db.NeuralDB()

db.insert(
  sources=[ndb.PDF(filename), ndb.DOCX(filename), ndb.CSV(filename)], 
  train=True
)

results = ndb.search(
    query="what is the termination period of this contract?",
    top_k=2,
)

for result in results:
    print(result.text)

NeuralDB also provides teaching methods for incorporating human feedback into RAG.

# associate a source with a target
db.associate(source="parties involved", target="made by and between")

# associate text with a result
db.text_to_result("made by and between",0)

See the neural_db folder for more examples and documentation.

๐Ÿช Universal Deep Transformer (for all Transformer and ML needs)

Universal Deep Transformer (UDT) is our consolidated API for performing different ML tasks on a variety of data types. It handles text, numeric, categorical, multi-categorical, graph, and time series data while generalizing to tasks like NLP, multi-class classification, multi-label retrieval, regression etc. Just like NeuralDB, UDT is optimized for conventional CPUs and is accessible to any standard desktop machine.

Some applications of UDT include:

Here is an example of the UDT API used for multi-label tabular classification:

from thirdai import bolt

model = bolt.UniversalDeepTransformer(
    data_types={
        "title": bolt.types.text(),
        "category": bolt.types.categorical(),
        "number": bolt.types.numerical(range=(0, 100)),
        "label": bolt.types.categorical(delimiter=":")
    },
    target="label",
    n_target_classes=2,
    delimiter='\t',
)

model.train(filename.csv, epochs=5, learning_rate=0.001, metrics=["precision@1"])

model.predict({"title": "Red shoes", "category": "XL", "number": "12.6"})

See the universal_deep_transformer folder for more examples and documentation.

๐Ÿ“„ License

Many notebooks come with an API key that will only work on the dataset in the demo. If you want to try out ThirdAI on your own dataset, simply register for a free license here.

To use your license do the following before constructing your NeuralDB or UDT models.

from thirdai import licensing

licensing.activate("") # insert your valid license key here

# create NeuralDB or UDT ...

Please refer to LICENSE.txt for more information on usage terms.

๐ŸŽ™ Contact

ThirdAILabs - @ThirdAILab - contact@thirdai.com