JohnSnowLabs / langtest

Deliver safe & effective language models
http://langtest.org/
Apache License 2.0
506 stars 40 forks source link

Feature/support for loading datasets from dlt within databricks #1148

Open chakravarthik27 opened 2 days ago

chakravarthik27 commented 2 days ago

This pull request includes several changes to the langtest/datahandler/datasource.py and langtest/tasks/task.py files to add support for Spark datasets and improve the handling of file extensions. The most important changes include the addition of a new SparkDataset class and modifications to the __init__ and load methods to accommodate the new dataset type.

Support for Spark datasets:

Improvements to file extension handling:

Enhancements to task handling: