dlt-hub / verified-sources

Contribute to dlt verified sources 🔥
https://dlthub.com/docs/walkthroughs/add-a-verified-source
Apache License 2.0
49 stars 38 forks source link

Type is not JSON serializable: Timestamp #484

Open kiwamizamurai opened 1 month ago

kiwamizamurai commented 1 month ago

dlt version

0.4.10

Describe the problem

I cannot do elt from mongodb atlas to bigquery, especially ingestion from mongo

Actually, this problem resides within the dependent package orjson and there is an opening issue. https://github.com/ijl/orjson/issues/442

Error message

<class 'TypeError'>
Type is not JSON serializable: Timestamp
 {'allUsers': [{'db': 'admin', 'user': 'hogehoge'}],
  'appName': 'MongoDB Compass',
  'client': 'xxxxxxxxxxx',
  'command': {'$clusterTime': {'clusterTime': Timestamp(1686813362, 110),
                               'signature': {'hash': b'xxxxxxxx',
                                             'keyId': xxxxxxxxxxxx}},
              '$db': 'test',
              'aggregate': 'fugafuga',
              'cursor': {},
TypeError: Timestamp(1686813362, 110) is not JSON serializable

https://mongodb.github.io/node-mongodb-native/4.0/interfaces/clustertime.html

Expected behavior

no error occur and the elt finishes successfully

Steps to reproduce

working on it

Operating system

macOS

Runtime environment

Local

Python version

3.11

dlt data source

mongo

dlt destination

Google BigQuery

Other deployment details

Referred this article

Additional information

No response

sultaniman commented 1 month ago

Hey @kiwamizamurai thanks for feedback. I am wondering if you are using our mongodb source if so then you can adjust it and make explicit conversion like?

def convert_mongo_objs(value: Any) -> Any:
    if isinstance(value, (ObjectId, Decimal128)):
        return str(value)
    if isinstance(value, _datetime.datetime):
        return ensure_pendulum_datetime(value)
    return value
rudolfix commented 1 month ago

@IlyaFaer my take is that we are not translating several basic bson types to Python types. could you take a look?

rudolfix commented 3 weeks ago

what we need is

  1. a test where we put all possible datatypes in mongo into a collection and try to retrieve
  2. see which types you can actually decode to python dict/list item format
  3. see which types we can decode into arrow
  4. and then fix our decoder and possibly type casting for arrow