Palashio / libra

Ergonomic machine learning for everyone.
http://libradocs.org/
MIT License
1.91k stars 108 forks source link

documentation for summarization_query is unclear #357

Closed fredzannarbor closed 3 years ago

fredzannarbor commented 3 years ago

I don’t understand the documentation for summarization_query.

Automatically fits a transfer-learning Document Summarization model to your dataset. This model will have frozen layers with pretrained weights to help with small dataset sizes. Stored as 'doc_summarization' in models dictionary.

Dataset Guidelines

The data that you want to summarized should be the target of the instruction. So if you want to summarize tweets, the instruction could be 'summarize long textual tweets'.
The result, or the summary should be in a column called 'summary'. THIS IS ESSENTIAL.
Your instruction should be about the label column, not the text.

Why does it need a “summary” column? All I have is the original text. I want the model to generate the summary! I would like to be able to summarize the entire data set, or to summarize each row.

https://libradocs.github.io/html/nlp.html

anas-awadalla commented 3 years ago

What we are doing in the summarization query is we are finetuning a model on your dataset. If you want to run just inference I would look into Huggingface transformer pipelines as those should provide an easy way to play around with trained models for different tasks https://huggingface.co/transformers/main_classes/pipelines.html#transformers.SummarizationPipeline

Closing for now but feel free to open if this is still an issue.