FRosner / drunken-data-quality

Spark package for checking data quality
Apache License 2.0
222 stars 69 forks source link

DDQ requires spark-hive #92

Open FRosner opened 8 years ago

FRosner commented 8 years ago
[error] (compile:compile) scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in Check.class refers to term hive
[error] in package org.apache.spark.sql which is not available.
[error] It may be completely missing from the current classpath, or the version on
[error] the classpath might be incompatible with the version used when compiling Check.class.

We might want to get rid of this dependency?

pieterjanverbruggen commented 7 years ago

This issue is preventing me to package DDQ in my project, how can we remove the dependency?

FRosner commented 7 years ago

@pieterjanverbruggen thanks for reporting this. I don't fully remember what was the problem here as I managed to package it eventually. Can you post a part of your build.sbt that defines the library dependencies?

FRosner commented 7 years ago

We can also talk in the gitter chat if you'd like.

pieterjanverbruggen commented 7 years ago

I'm using Maven, dropped you the extract via gitter

pieterjanverbruggen commented 7 years ago

thanks a lot for looking in

FRosner commented 7 years ago

Resolved by just adding the spark-hive dependency đź‘Ť

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_${scala.number}</artifactId>
        <version>1.6.1</version>
    </dependency>
maxwellangalabiri commented 3 years ago

I am new to programming so I don't know how go about adding the dependencies of the library. Can you guide me please

FRosner commented 3 years ago

What have you tried so far? What is the problem? Can you share your build definition (pom.xml e.g.)?

maxwellangalabiri commented 3 years ago

I am quite new to all of this. I just made a switch to data science so I am still trying to get use to it, especially the hard coding part. I am presently doing an internship with a company and I will love to use ddq in the project I am working on now. How can I set it up and use it in zeppelin please

Get Outlook for iOShttps://aka.ms/o0ukef


From: Frank Rosner @.> Sent: Friday, March 19, 2021 5:41:30 PM To: FRosner/drunken-data-quality @.> Cc: Maxwell Angalabiri @.>; Comment @.> Subject: Re: [FRosner/drunken-data-quality] DDQ requires spark-hive (#92)

[https://s3.amazonaws.com/staticmediafiles/media/sights/iron-icon-color.png] IRONSCALES notices this is the first time you have received an e-mail from this person @.***, please proceed with caution.

What have you tried so far? What is the problem? Can you share your build definition (pom.xml e.g.)?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/FRosner/drunken-data-quality/issues/92#issuecomment-803001492, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATHSKUSL53BKOALRF676UYTTEOEEVANCNFSM4CBHCTXA.

This e-mail may contain privileged and confidential information and/or copyright material and is intended for the use of the addressee only. If you receive this e-mail by mistake please advise the sender immediately by using the reply facility in your e-mail software and delete this e-mail from your computer system. You may not deliver, copy or disclose its contents to anyone else. Any unauthorised use may be unlawful. Any views expressed in this e-mail are those of the individual sender and may not necessarily reflect the views of The Collinson Group Ltd and/or its subsidiaries or any other associated company (collectively “Collinson Group”).

As communications via the Internet are not secure Collinson Group cannot accept any liability if this e-mail is accessed by third parties during the course of transmission or is modified or amended in any way following dispatch. Collinson Group cannot guarantee that any attachment to this email does not contain a virus, therefore it is strongly recommended that you carry out your own virus check before opening any attachment, as we cannot accept liability for any damage sustained as a result of software virus infection. Senders of messages shall be taken to consent to the monitoring and recording of e-mails addressed to members of the Collinson Group.

FRosner commented 3 years ago

Ah ok I thought you commented here because you had the same issue. Did you manage to access Spark from Zeppelin?

maxwellangalabiri commented 3 years ago

It’s a company’s product so it’s already set up. I am so excited to get a response from you. I really appreciate

Get Outlook for iOShttps://aka.ms/o0ukef


From: Frank Rosner @.> Sent: Friday, March 19, 2021 5:48:49 PM To: FRosner/drunken-data-quality @.> Cc: Maxwell Angalabiri @.>; Comment @.> Subject: Re: [FRosner/drunken-data-quality] DDQ requires spark-hive (#92)

Ah ok I thought you commented here because you had the same issue. Did you manage to access Spark from Zeppelin?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/FRosner/drunken-data-quality/issues/92#issuecomment-803005636, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATHSKUXK7A7CJZ67ETNIMVLTEOFADANCNFSM4CBHCTXA.

This e-mail may contain privileged and confidential information and/or copyright material and is intended for the use of the addressee only. If you receive this e-mail by mistake please advise the sender immediately by using the reply facility in your e-mail software and delete this e-mail from your computer system. You may not deliver, copy or disclose its contents to anyone else. Any unauthorised use may be unlawful. Any views expressed in this e-mail are those of the individual sender and may not necessarily reflect the views of The Collinson Group Ltd and/or its subsidiaries or any other associated company (collectively “Collinson Group”).

As communications via the Internet are not secure Collinson Group cannot accept any liability if this e-mail is accessed by third parties during the course of transmission or is modified or amended in any way following dispatch. Collinson Group cannot guarantee that any attachment to this email does not contain a virus, therefore it is strongly recommended that you carry out your own virus check before opening any attachment, as we cannot accept liability for any damage sustained as a result of software virus infection. Senders of messages shall be taken to consent to the monitoring and recording of e-mails addressed to members of the Collinson Group.

FRosner commented 3 years ago

Ok. Can you ask the person who setup the Zeppelin to add https://spark-packages.org/package/FRosner/drunken-data-quality? I don't know if it works with your version of Spark since I didn't update the project in quite some time.

maxwellangalabiri commented 3 years ago

That will not be possible as I am only am I.T student. But if I can set up my personal zeppelin and add ddq functionality if I get a guide on how to go about it. I am very sorry for stressing you with this. I saw it functionalities and it’s intriguing. I am very grateful for your assistance

Get Outlook for iOShttps://aka.ms/o0ukef


From: Frank Rosner @.> Sent: Friday, March 19, 2021 5:54:39 PM To: FRosner/drunken-data-quality @.> Cc: Maxwell Angalabiri @.>; Comment @.> Subject: Re: [FRosner/drunken-data-quality] DDQ requires spark-hive (#92)

Ok. Can you ask the person who setup the Zeppelin to add https://spark-packages.org/package/FRosner/drunken-data-qualityhttps://spark-packages.org/package/FRosner/drunken-data-quality? I don't know if it works with your version of Spark since I didn't update the project in quite some time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/FRosner/drunken-data-quality/issues/92#issuecomment-803008802, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATHSKUTE24A5OSIG2IJY7TTTEOFV7ANCNFSM4CBHCTXA.

This e-mail may contain privileged and confidential information and/or copyright material and is intended for the use of the addressee only. If you receive this e-mail by mistake please advise the sender immediately by using the reply facility in your e-mail software and delete this e-mail from your computer system. You may not deliver, copy or disclose its contents to anyone else. Any unauthorised use may be unlawful. Any views expressed in this e-mail are those of the individual sender and may not necessarily reflect the views of The Collinson Group Ltd and/or its subsidiaries or any other associated company (collectively “Collinson Group”).

As communications via the Internet are not secure Collinson Group cannot accept any liability if this e-mail is accessed by third parties during the course of transmission or is modified or amended in any way following dispatch. Collinson Group cannot guarantee that any attachment to this email does not contain a virus, therefore it is strongly recommended that you carry out your own virus check before opening any attachment, as we cannot accept liability for any damage sustained as a result of software virus infection. Senders of messages shall be taken to consent to the monitoring and recording of e-mails addressed to members of the Collinson Group.

FRosner commented 3 years ago

Then I recommend you try to get Zeppelin running on your personal computer. You can check out https://zeppelin.apache.org/docs/0.9.0/ for instructions. Then you could try to load the spark package. Back in the days that could be done using something like this but no clue. You can ask the person who setup Zeppelin in your company how to add spark packages.

%dep
z.load("FRosner:drunken-data-quality:5.0.0-s_2.11")
maxwellangalabiri commented 3 years ago

I have pyspark already running on my personally machine and I tried adding the package but it threw up error that “the dependencies cannot be found”. How do I add the dependencies so I can use it on pyspark instead

Get Outlook for iOShttps://aka.ms/o0ukef


From: Frank Rosner @.> Sent: Friday, March 19, 2021 6:11:10 PM To: FRosner/drunken-data-quality @.> Cc: Maxwell Angalabiri @.>; Comment @.> Subject: Re: [FRosner/drunken-data-quality] DDQ requires spark-hive (#92)

Then I recommend you try to get Zeppelin running on your personal computer. You can check out https://zeppelin.apache.org/docs/0.9.0/https://zeppelin.apache.org/docs/0.9.0 for instructions. Then you could try to load the spark package. Back in the days that could be done using something like this but no clue. You can ask the person who setup Zeppelin in your company how to add spark packages.

%dep z.load("FRosner:drunken-data-quality:5.0.0-s_2.11")

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/FRosner/drunken-data-quality/issues/92#issuecomment-803018302, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATHSKUTF6NJ6Q7I2JYKEXFLTEOHT5ANCNFSM4CBHCTXA.

This e-mail may contain privileged and confidential information and/or copyright material and is intended for the use of the addressee only. If you receive this e-mail by mistake please advise the sender immediately by using the reply facility in your e-mail software and delete this e-mail from your computer system. You may not deliver, copy or disclose its contents to anyone else. Any unauthorised use may be unlawful. Any views expressed in this e-mail are those of the individual sender and may not necessarily reflect the views of The Collinson Group Ltd and/or its subsidiaries or any other associated company (collectively “Collinson Group”).

As communications via the Internet are not secure Collinson Group cannot accept any liability if this e-mail is accessed by third parties during the course of transmission or is modified or amended in any way following dispatch. Collinson Group cannot guarantee that any attachment to this email does not contain a virus, therefore it is strongly recommended that you carry out your own virus check before opening any attachment, as we cannot accept liability for any damage sustained as a result of software virus infection. Senders of messages shall be taken to consent to the monitoring and recording of e-mails addressed to members of the Collinson Group.

FRosner commented 3 years ago

Can you please post the entire error message and steps to reproduce / what you did to arrive to that error message? Otherwise I cannot help you :)