flairNLP / fabricator

[EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.
Apache License 2.0
99 stars 13 forks source link

execution speed degrades with increasing sizes of datasets to be annotated #48

Closed fhamborg closed 1 year ago

fhamborg commented 1 year ago

during the annotation process of a 10k sized dataset that is to be annotated, the speed degrades. im unsure whether this due to openai's api, e.g., rate limits, or caused by our code, e.g., non-linear complexity, e.g., in the _inner_loop function (?)

HallerPatrick commented 1 year ago

I can look into this

HallerPatrick commented 1 year ago

Looks rather constant to me over 25k (unlabelled text generation)

Figure_1 figure_2

fhamborg commented 1 year ago

Thanks for looking into this. Could very well be, just had this feeling when randomly looking at the progress. We can close the issue then.

Patrick Haller @.***> schrieb am Mo., 24. Juli 2023, 15:49:

Looks rather constant to me over 25k (unlabelled text generation)

Figure_1.png (view on web) https://github.com/flairNLP/ai-dataset-generator/assets/22773355/b3579bd7-8d42-4cc6-8435-c7c92f73febc figure_2.png (view on web) https://github.com/flairNLP/ai-dataset-generator/assets/22773355/aacfe9cb-35e4-4573-99c1-3916a0054b48

— Reply to this email directly, view it on GitHub https://github.com/flairNLP/ai-dataset-generator/issues/48#issuecomment-1647956382, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOVPBWB3EQ7BAOQFC5FIYTXRZ4P3ANCNFSM6AAAAAA2VPFVSQ . You are receiving this because you authored the thread.Message ID: @.***>