datacontract / datacontract-cli

CLI to manage your datacontract.yaml files
https://cli.datacontract.com
Other
352 stars 60 forks source link

ImportFactory error with optional dependencies #286

Closed pierre-monnet closed 2 days ago

pierre-monnet commented 4 days ago

To reduce Databricks start time I use optional dependency datacontract-cli[databricks]==0.10.8 (as mention here #262).
But when I try to use import_from_source with Glue format I have an error about missing Avro dependency. In fact, I don't want to import Avro because I will never use this Importer.
The problems comes from the way the ImportFactory is design, it import all *_impoter and so all associated dependencies.
What do you think about redesign this class to only import the needed *_importer ?

This is my code snippet:

from datacontract.data_contract import DataContract

contract = DataContract(spark=spark).import_from_source(
    format="glue",
    source="my_db"
)

This is the error returned:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/datacontract/data_contract.py:17
     15 from datacontract.export.exporter import ExportFormat
     16 from datacontract.export.exporter_factory import exporter_factory
---> 17 from datacontract.imports.avro_importer import import_avro
     18 from datacontract.imports.bigquery_importer import import_bigquery_from_api, import_bigquery_from_json
     19 from datacontract.imports.glue_importer import import_glue
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/datacontract/imports/avro_importer.py:1
----> 1 import avro.schema
      3 from datacontract.model.data_contract_specification import DataContractSpecification, Model, Field
      4 from datacontract.model.exceptions import DataContractException
teoria commented 4 days ago

i got it ! i think we need add a validation before add the importer to factory i'll try

teoria commented 4 days ago
import importlib.util
if importlib.util.find_spec("datacontract.imports.avro_importer"):
    from datacontract.imports.avro_importer import AvroImporter
    importer_factory.register_importer(ImportFormat.avro, AvroImporter)

@jochenchrist what do you think ?

teoria commented 4 days ago

@pierre-monnet https://github.com/datacontract/datacontract-cli/pull/288 could you test with this branch?

pierre-monnet commented 3 days ago

Same error :/


File <command-1477724087541722>, line 4
      2 sys.path.append(os.path.abspath('datacontract-cli/'))
      3 #from datacontract.export.html_export import to_html
----> 4 from datacontract.data_contract import DataContract
      6 contract = DataContract(spark=spark).import_from_source(
      7     format="glue",
      8     source="my_db"
      9 )
File datacontract-cli/datacontract/data_contract.py:17
     15 from datacontract.export.exporter import ExportFormat
     16 from datacontract.export.exporter_factory import exporter_factory
---> 17 from datacontract.imports.importer_factory import importer_factory
     19 from datacontract.integration.publish_datamesh_manager import publish_datamesh_manager
     20 from datacontract.integration.publish_opentelemetry import publish_opentelemetry
File datacontract-cli/datacontract/imports/importer_factory.py:21
     18 importer_factory = ImporterFactory()
     20 if importlib.util.find_spec("datacontract.imports.avro_importer"):
---> 21     from datacontract.imports.avro_importer import AvroImporter
     23     importer_factory.register_importer(ImportFormat.avro, AvroImporter)
     25 if importlib.util.find_spec("datacontract.imports.bigquery_importer"):
File datacontract-cli/datacontract/imports/avro_importer.py:1
----> 1 import avro.schema
      3 from datacontract.imports.importer import Importer
      4 from datacontract.model.data_contract_specification import DataContractSpecification, Model, Field```
pierre-monnet commented 3 days ago

@teoria I proposed a change #291. The code structure might be durty but it works.

teoria commented 3 days ago

@pierre-monnet https://github.com/datacontract/datacontract-cli/pull/292

look this i used a dynamic import now i hope solve your issue

could you delete your fix pr ?

pierre-monnet commented 2 days ago

Thanks @teoria for the fix.