apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.63k stars 1.68k forks source link

MongoDB - Sink - PluginIdentifier not found #6800

Open a11dev opened 2 months ago

a11dev commented 2 months ago

Search before asking

What happened

simple seatunnel configuration , designed to sink oracle to mongodb. Oracle to postgres is working. I added a new sink

    MongoDB {
        source_table_name = "source"
        uri = "mongodb://user:pwd@ipaddress:27017/dbname?readPreference=secondary&slaveOk=true"
        database = "dbname"
        collection = "destcollection"
        upsert-enable = true
        primary-key = ["pkname"]

    }

it ends with this exception.

Is related to mongodb java library, but which and where must it be installed. I've done it but no solution I tried is working.

I tried with :

mongodb-driver-core-5.1.0
mongodb-java-driver-5.1.0

and as alternative:

mongodb-driver-sync-5.1.0

placing it into: seatunnel_home\lib plugins\MongoDB\lib connectors\seatunnel

documentation link to the right download is failing.

SeaTunnel Version

2.3.4

SeaTunnel Config

#Also into the first attachment:

env {
    parallelism = 2
    job.mode=STREAMING
    job.name=SeaTunnel_Job
    read_limit.bytes_per_second=7000000
    read_limit.rows_per_second=400
}

  Oracle-CDC {

    result_table_name = "tab1"
    base-url = "jdbc:oracle:thin:user/password@ip:1521/service_name"
    source.reader.close.timeout = 120000
    username = "user"
    password = "password"
    database-names = ["DBNAME"]
    # real db name DBNAME.domain.local ( it works with DBNAME )
    schema-names = ["SCHEMA"]
    startup.mode = "INITIAL"
    table-names = ["DBNAME.SCHEMA.TABLE1"]
  }

}

sink {
    MongoDB {
        source_table_name = "source"
        uri = "mongodb://user:pwd@ipaddress:27017/dbname?readPreference=secondary&slaveOk=true"
        database = "dbname"
        collection = "destcollection"
        upsert-enable = true
        primary-key = ["pkname"]

    }

}

### Running Command

```shell
java -Dlog4j2.configurationFile=E:\programmi\apache-seatunnel-2.3.4\config\log4j2_client.properties -Dhazelcast.client.config=E:\programmi\apache-seatunnel-2.3.4\config\hazelcast-client.yaml -Dseatunnel.config=E:\programmi\apache-seatunnel-2.3.4\config\seatunnel.yaml -Dhazelcast.config=E:\programmi\apache-seatunnel-2.3.4\config\hazelcast.yaml -Dseatunnel.logs.file_name=seatunnel-starter-clienttest -Xms256m -Xmx512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=E:\programmi\apache-seatunnel-2.3.4\dump\zeta-client  -cp E:\programmi\apache-seatunnel-2.3.4\lib\*;E:\programmi\apache-seatunnel-2.3.4\starter\seatunnel-starter.jar org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient  --config .\config\v2.batch.config.template -m local

Error Exception

2024-05-06 13:42:06,610 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Fatal Error, 

2024-05-06 13:42:06,611 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Please submit bug report in https://github.com/apache/seatunnel/issues

2024-05-06 13:42:06,612 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Reason:SeaTunnel job executed failed 

2024-05-06 13:42:06,614 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Exception StackTrace:org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:199)
    at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
    at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
Caused by: java.lang.RuntimeException: Plugin PluginIdentifier{engineType='seatunnel', pluginType='sink', pluginName='MongoDB'} not found.
    at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.createPluginInstance(AbstractPluginDiscovery.java:231)
    at org.apache.seatunnel.engine.core.parse.ConnectorInstanceLoader.loadSinkInstance(ConnectorInstanceLoader.java:77)
    at org.apache.seatunnel.engine.core.parse.JobConfigParser.parseSink(JobConfigParser.java:194)
    at org.apache.seatunnel.engine.core.parse.JobConfigParser.parseSinks(JobConfigParser.java:170)
    at org.apache.seatunnel.engine.core.parse.MultipleTableJobConfigParser.parseSink(MultipleTableJobConfigParser.java:531)
    at org.apache.seatunnel.engine.core.parse.MultipleTableJobConfigParser.parse(MultipleTableJobConfigParser.java:193)
    at org.apache.seatunnel.engine.client.job.ClientJobExecutionEnvironment.getLogicalDag(ClientJobExecutionEnvironment.java:88)
    at org.apache.seatunnel.engine.client.job.ClientJobExecutionEnvironment.execute(ClientJobExecutionEnvironment.java:161)
    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:146)
    ... 2 more

Zeta or Flink or Spark Version

Zeta

Java or Scala Version

JAva

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

Hisoka-X commented 2 months ago

Could you try with 2.3.5? It should be fixed by #6551

a11dev commented 2 months ago

Thanks I will try that. ( with 2.3.4 connectors ) From where connectors 2.3.5 are available? Docs links are failing!

Alessandro

Il giorno gio 9 mag 2024 alle ore 05:44 Jia Fan @.***> ha scritto:

Could you try with 2.3.5? It should be fixed by #6551 https://github.com/apache/seatunnel/pull/6551

— Reply to this email directly, view it on GitHub https://github.com/apache/seatunnel/issues/6800#issuecomment-2101877591, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2V56Z6AEJEY23ZCJHATVDZBLWJHAVCNFSM6AAAAABHI7W73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRHA3TONJZGE . You are receiving this because you authored the thread.Message ID: @.***>

Hisoka-X commented 2 months ago

You can get download link from https://www.apache.org/dyn/closer.lua/seatunnel/2.3.5/apache-seatunnel-2.3.5-bin.tar.gz

a11dev commented 2 months ago

Again:

2024-05-09 08:16:48,437 ERROR [o.a.s.c.s.SeaTunnel ] [main] - Exception StackTrace:org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202) at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40) at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34) Caused by: java.lang.RuntimeException: Plugin PluginIdentifier{engineType='seatunnel', pluginType='sink', pluginName='MongoDB'} not found. at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.createPluginInstance(AbstractPluginDiscovery.java:234) at org.apache.seatunnel.engine.core.parse.ConnectorInstanceLoader.loadSinkInstance(ConnectorInstanceLoader.java:77) at org.apache.seatunnel.engine.core.parse.JobConfigParser.parseSink(JobConfigParser.java:159) at org.apache.seatunnel.engine.core.parse.JobConfigParser.parseSinks(JobConfigParser.java:135) at org.apache.seatunnel.engine.core.parse.MultipleTableJobConfigParser.parseSink(MultipleTableJobConfigParser.java:517) at org.apache.seatunnel.engine.core.parse.MultipleTableJobConfigParser.parse(MultipleTableJobConfigParser.java:200) at org.apache.seatunnel.engine.client.job.ClientJobExecutionEnvironment.getLogicalDag(ClientJobExecutionEnvironment.java:88) at org.apache.seatunnel.engine.client.job.ClientJobExecutionEnvironment.execute(ClientJobExecutionEnvironment.java:156) at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:149) ... 2 more

The point is, which jar is missing and where must be distributed. since I'm using Zeta engine I should locate it into the seatunnel .\lib. I added there the mongodb-driver-sync-5.1.0.jar.

Mongo db sink: MongoDB { source_table_name = "assignments" uri = @.*** :27017/dbname?readPreference=secondary&slaveOk=true" database = "dbname" collection = "assignments" upsert-enable = true primary-key = ["w6key"] }

thanks Alessandro

Il giorno gio 9 mag 2024 alle ore 05:44 Jia Fan @.***> ha scritto:

Could you try with 2.3.5? It should be fixed by #6551 https://github.com/apache/seatunnel/pull/6551

— Reply to this email directly, view it on GitHub https://github.com/apache/seatunnel/issues/6800#issuecomment-2101877591, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2V56Z6AEJEY23ZCJHATVDZBLWJHAVCNFSM6AAAAABHI7W73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRHA3TONJZGE . You are receiving this because you authored the thread.Message ID: @.***>

Hisoka-X commented 2 months ago

did you execute bin/install-plugin.sh ? https://seatunnel.apache.org/docs/2.3.5/start-v2/locally/deployment#step-3-install-connectors-plugin

a11dev commented 2 months ago

Connectors are not available.

This is the install_plugin output: [...] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.8:get (default-cli) on project standalone-pom: Couldn't download artifact: Missing: [ERROR] ---------- [ERROR] 1) org.apache.seatunnel: connector-mongodb:jar:2.3.5 [ERROR] [ERROR] Try downloading the file manually from the project website. [ERROR] [ERROR] Then, install it using the command: [ERROR] mvn install:install-file -DgroupId=org.apache.seatunnel -DartifactId= connector-mongodb -Dversion=2.3.5 -Dpackaging=jar -Dfile=/path/to/file [ERROR] [ERROR] Alternatively, if you host your own repository you can deploy the file there: [ERROR] mvn deploy:deploy-file -DgroupId=org.apache.seatunnel -DartifactId= connector-mongodb -Dversion=2.3.5 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id] [ERROR] [ERROR] Path to dependency: [ERROR] 1) org.apache.maven.plugins:maven-downloader-plugin:jar:1.0 [ERROR] 2) org.apache.seatunnel: connector-mongodb:jar:2.3.5 [ERROR] [ERROR] ---------- [ERROR] 1 required artifact is missing. [...]

also from seatunnel website connectos llinks are not working , it seems connectors lib are no more available from the repository. the url might be changed?

thanks Alessandro

Il giorno gio 9 mag 2024 alle ore 10:15 Jia Fan @.***> ha scritto:

did you execute bin/install-plugin.sh ? https://seatunnel.apache.org/docs/2.3.5/start-v2/locally/deployment#step-3-install-connectors-plugin

— Reply to this email directly, view it on GitHub https://github.com/apache/seatunnel/issues/6800#issuecomment-2102178993, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2V564WR43ERTGBR7GP4HDZBMWDVAVCNFSM6AAAAABHI7W73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSGE3TQOJZGM . You are receiving this because you authored the thread.Message ID: @.***>

Hisoka-X commented 2 months ago

cc @liugddx

a11dev commented 3 weeks ago

Hi Jia, I resumed my seatunnel tests. Still trying to sync mongo from oracle cdc.

Firstly as written apache seatunnel documentation is pointing to a wrong maven repository. In the end I found the connector component manually navigating https://mvnrepository.com/artifact/org.apache.seatunnel. The plugin installer is throwing an exception because the maven repository is not available. Documentation, related to manual installation, doesn't mention other libraries needed mongodb-driver-sync-4.7.1 and mongodb-driver-core-4.7.1 as requested from the connector manifes.

Btw this I achieved it.

Now I'm able to run the documentation example but not a real scenario , I've a big oracle table. I like to transform it into a document into a mongo collection, to make it simple I applied a transformation in the middle : [...] transform { Sql { source_table_name = "assignments" result_table_name = "mongoassignments" query = "select pkname from assignments" } } [...]

MongoDB { source_table_name = ["mongoassignments"] uri = @.***:27017/?authSource=dbname" database = "dbname" collection = "assignmentskeys" upsert-enable = true primary-key = [" pkname "]

schema = {
  fields {
    _id = string
      pkname   = bigint
      }
}

}

but it throws such an error: java: Caused by: com.mongodb.MongoBulkWriteException: Bulk write operation error on server XXXXXX:27017. Write errors: [BulkWriteError{index=0, code=2, message='$and/$or/$nor must be a nonempty array', details={}}].

any idea?

Do you think, might be possible to set up the denv env and try debugging it?

thanks a lot Alessandro

Il giorno mer 15 mag 2024 alle ore 04:14 Jia Fan @.***> ha scritto:

cc @liugddx https://github.com/liugddx

— Reply to this email directly, view it on GitHub https://github.com/apache/seatunnel/issues/6800#issuecomment-2111454976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2V566DX2V2W52QBLOUQCTZCLAIJAVCNFSM6AAAAABHI7W73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGQ2TIOJXGY . You are receiving this because you authored the thread.Message ID: @.***>

a11dev commented 3 weeks ago

Sorry... I did! the basic transfer is working now, no help needed on this topic. Sorry I disturbed you! Is there a way to update a collection in order to perform a dernomalization along the oracle to mongo sync?

Ale

Il giorno lun 24 giu 2024 alle ore 14:54 Alessandro Leonardi < @.***> ha scritto:

Hi Jia, I resumed my seatunnel tests. Still trying to sync mongo from oracle cdc.

Firstly as written apache seatunnel documentation is pointing to a wrong maven repository. In the end I found the connector component manually navigating https://mvnrepository.com/artifact/org.apache.seatunnel. The plugin installer is throwing an exception because the maven repository is not available. Documentation, related to manual installation, doesn't mention other libraries needed mongodb-driver-sync-4.7.1 and mongodb-driver-core-4.7.1 as requested from the connector manifes.

Btw this I achieved it.

Now I'm able to run the documentation example but not a real scenario , I've a big oracle table. I like to transform it into a document into a mongo collection, to make it simple I applied a transformation in the middle : [...] transform { Sql { source_table_name = "assignments" result_table_name = "mongoassignments" query = "select pkname from assignments" } } [...]

MongoDB { source_table_name = ["mongoassignments"] uri = @.***:27017/?authSource=dbname" database = "dbname" collection = "assignmentskeys" upsert-enable = true primary-key = [" pkname "]

schema = {
  fields {
    _id = string
      pkname   = bigint
      }
}

}

but it throws such an error: java: Caused by: com.mongodb.MongoBulkWriteException: Bulk write operation error on server XXXXXX:27017. Write errors: [BulkWriteError{index=0, code=2, message='$and/$or/$nor must be a nonempty array', details={}}].

any idea?

Do you think, might be possible to set up the denv env and try debugging it?

thanks a lot Alessandro

Il giorno mer 15 mag 2024 alle ore 04:14 Jia Fan @.***> ha scritto:

cc @liugddx https://github.com/liugddx

— Reply to this email directly, view it on GitHub https://github.com/apache/seatunnel/issues/6800#issuecomment-2111454976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2V566DX2V2W52QBLOUQCTZCLAIJAVCNFSM6AAAAABHI7W73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGQ2TIOJXGY . You are receiving this because you authored the thread.Message ID: @.***>

Hisoka-X commented 2 weeks ago

Is there a way to update a collection in order to perform a dernomalization along the oracle to mongo sync?

not sure what you want, could you share some example?

a11dev commented 2 weeks ago

yes, source oracle, table A and B destination mongodb, collection called AB where documents are created embedding B into A {A,{B}} Is there a way to configure a sink in order to embed B into A?

Thanks a lot. Alessandro

Il giorno mar 25 giu 2024 alle ore 04:16 Jia Fan @.***> ha scritto:

Is there a way to update a collection in order to perform a dernomalization along the oracle to mongo sync?

not sure what you want, could you share some example?

— Reply to this email directly, view it on GitHub https://github.com/apache/seatunnel/issues/6800#issuecomment-2187820246, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2V567IY727CZT5A3L5LNTZJDHGPAVCNFSM6AAAAABHI7W73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBXHAZDAMRUGY . You are receiving this because you authored the thread.Message ID: @.***>

Hisoka-X commented 2 weeks ago

Sorry, seatunnel can not do this at now.

a11dev commented 2 weeks ago

Thanks

Alessandro

Il giorno mar 25 giu 2024 alle ore 07:58 Jia Fan @.***> ha scritto:

Sorry, seatunnel can not do this at now.

— Reply to this email directly, view it on GitHub https://github.com/apache/seatunnel/issues/6800#issuecomment-2188041020, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2V56ZSGO3XYN67GSLQTVDZJEBILAVCNFSM6AAAAABHI7W73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBYGA2DCMBSGA . You are receiving this because you authored the thread.Message ID: @.***>

a11dev commented 2 weeks ago

Again, thank you for your kind response; I would also like to compliment you! You all have done a fantastic job; Seatunnel is an excellent tool for data synchronization, a single scalable application to manage all sources and destinations without additional components that increase the complexity of configuration and maintenance.

Thanks Ale

Il giorno mar 25 giu 2024 alle ore 07:58 Jia Fan @.***> ha scritto:

Sorry, seatunnel can not do this at now.

— Reply to this email directly, view it on GitHub https://github.com/apache/seatunnel/issues/6800#issuecomment-2188041020, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2V56ZSGO3XYN67GSLQTVDZJEBILAVCNFSM6AAAAABHI7W73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBYGA2DCMBSGA . You are receiving this because you authored the thread.Message ID: @.***>