Closed pierre-monnet closed 2 days ago
i got it ! i think we need add a validation before add the importer to factory i'll try
import importlib.util
if importlib.util.find_spec("datacontract.imports.avro_importer"):
from datacontract.imports.avro_importer import AvroImporter
importer_factory.register_importer(ImportFormat.avro, AvroImporter)
@jochenchrist what do you think ?
@pierre-monnet https://github.com/datacontract/datacontract-cli/pull/288 could you test with this branch?
Same error :/
File <command-1477724087541722>, line 4
2 sys.path.append(os.path.abspath('datacontract-cli/'))
3 #from datacontract.export.html_export import to_html
----> 4 from datacontract.data_contract import DataContract
6 contract = DataContract(spark=spark).import_from_source(
7 format="glue",
8 source="my_db"
9 )
File datacontract-cli/datacontract/data_contract.py:17
15 from datacontract.export.exporter import ExportFormat
16 from datacontract.export.exporter_factory import exporter_factory
---> 17 from datacontract.imports.importer_factory import importer_factory
19 from datacontract.integration.publish_datamesh_manager import publish_datamesh_manager
20 from datacontract.integration.publish_opentelemetry import publish_opentelemetry
File datacontract-cli/datacontract/imports/importer_factory.py:21
18 importer_factory = ImporterFactory()
20 if importlib.util.find_spec("datacontract.imports.avro_importer"):
---> 21 from datacontract.imports.avro_importer import AvroImporter
23 importer_factory.register_importer(ImportFormat.avro, AvroImporter)
25 if importlib.util.find_spec("datacontract.imports.bigquery_importer"):
File datacontract-cli/datacontract/imports/avro_importer.py:1
----> 1 import avro.schema
3 from datacontract.imports.importer import Importer
4 from datacontract.model.data_contract_specification import DataContractSpecification, Model, Field```
@teoria I proposed a change #291. The code structure might be durty but it works.
@pierre-monnet https://github.com/datacontract/datacontract-cli/pull/292
look this i used a dynamic import now i hope solve your issue
could you delete your fix pr ?
Thanks @teoria for the fix.
To reduce Databricks start time I use optional dependency
datacontract-cli[databricks]==0.10.8
(as mention here #262).But when I try to use
import_from_source
with Glue format I have an error about missing Avro dependency. In fact, I don't want to import Avro because I will never use this Importer.The problems comes from the way the
ImportFactory
is design, it import all*_impoter
and so all associated dependencies.What do you think about redesign this class to only import the needed
*_importer
?This is my code snippet:
This is the error returned: