ddWriting - Githubissues

alberskib commented 10 years ago

Full log from running ddwriting"

[info] Running ddwriter.DynamograpDistributedWriting run
[2014/07/23 02:07:57:884 UTC] (INFO) DynamograpDistributedWriting$: creating notification topic: nispero_-966230106
[2014/07/23 02:07:58:822 UTC] (INFO) DynamograpDistributedWriting$: creating dead letter queue: nispero_ddwriter010_snapshot_deadletters
[2014/07/23 02:07:58:863 UTC] (INFO) DynamograpDistributedWriting$: creating failures table
warning: table nispero_ddwriter010_snapshot_errors already exists
[2014/07/23 02:07:59:177 UTC] (INFO) DynamograpDistributedWriting$: creating bucket nispero-ddwriter010-snapshot
[2014/07/23 02:07:59:765 UTC] (INFO) Nispero$nisperoDistribution$: nispero upload: generating user script
[2014/07/23 02:07:59:772 UTC] (INFO) Nispero$nisperoDistribution$: nispero upload: launching manager group
[2014/07/23 02:07:00:430 UTC] (INFO) Nispero$nisperoDistribution$: nispero map: generating user script
[2014/07/23 02:07:00:431 UTC] (INFO) Nispero$nisperoDistribution$: nispero map: launching manager group
queue nispero-ddwriter010-snapshot-singleelements attributes:{}
[2014/07/23 02:07:03:134 UTC] (INFO) DynamograpDistributedWriting$: launching metamanager
[2014/07/23 02:07:03:520 UTC] (ERROR) BufferedSQSWriter: java.lang.InterruptedException null
[2014/07/23 02:07:03:522 UTC] (WARN) S3Writer: java.lang.InterruptedException null

Exception: sbt.TrapExitSecurityException thrown from the UncaughtExceptionHandler in thread "run-main-0"
[success] Total time: 9 s, completed 2014-07-23 14:54:03

With running compota-wordcount issue is related to the reading from other buckets:

 (run-main-0) com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 15EB1166F7D436F9, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: sBHzrvBIG9VN52gguKQ0OJbbnIzwnFASvG0Fw2CGIb1PjPgkzgMCaFNl7gPsXMRL
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 15EB1166F7D436F9, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: sBHzrvBIG9VN52gguKQ0OJbbnIzwnFASvG0Fw2CGIb1PjPgkzgMCaFNl7gPsXMRL
    at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:773)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:417)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:229)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3398)
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1095)
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:977)
    at ohnosequences.awstools.s3.S3.objectExists(S3.scala:278)
    at ohnosequences.nisperon.Nisperon.main(Nisperon.scala:137)
    at ohnosequences.compota.wordcount.Wordcount.main(WordCount.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)

But if I replace buckets configration to the bio4j I mean get:

bucket=private.snapshots.statika.bio4j.com
 key=bio4j/wordcount_2.10/0.1-SNAPSHOT/jars/wordcount_2.10.jar
[warn] com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 114A2F74158FAF3E, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: h4bu/fD0PE28KmX1qjGsGEl3YjeqLzUNXsBE4MCq9v5v2LUGccurPTjslPrxGoHrH0W/zzqDM/E=
[warn] com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 3CA871E8E3343A33, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: lTfQMk7/CKwSHnotuFcCh9VSF1CFQtAyh3kl81OlSJkSGAwQlGk+yUqt47u8qIIpzkGhZyIZHys=
[warn] com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 7629C387D08B5110, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: FskVrN28N+dyJOMv2PBqFEIZEkNTbG02e7bZPTmMFyYysh30gWXJ0di/EENY9WftQdMIbneCvf4=
[warn] com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: E4004E7F43BA5F3B, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: zKhZHBUo30moQHanpCZg/IH/crBBsho1SKNu8vI/pDN1SgMuqOzKBbszXbP9jOTCRB8nUlXBYSI=
[warn] com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 3B46A26019BEFC0A, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: YF/lMxmX0cgXk7fKwjVn51+s1k2Qe8gqXsDS0Ly+FxDZ4jnxtX720A7x1DUj8F/oP7KHETtGMSw=
java.lang.Error: couldn't create bucket

alberskib commented 10 years ago

@evdokim bump

alberskib commented 10 years ago

Hello @evdokim , @bio4j/dynamograph

I investigate further problems with ddwriting and it turns out that the problem is related to initilization. If I manually initiliaze workers and queues data are further processed. I have still problem with serializing as List[PutItemRequest] is not serializable by JSONSerialzier from compota but I think that as soon as I write my custom serializer for this type everything should work fine

alberskib commented 10 years ago

By manual initialization I mean placing this part of code to the addTask function:

    mapNispero.installWorker()
    mapNispero.worker.runInstructions()
    uploadNispero.installWorker()
    uploadNispero.worker.runInstructions()
    singleElements.init()
    singleElements.initWrite()

alberskib commented 10 years ago

It looks like compota does not invoke runInstructions command on workers. Maybe I am missing something in my code? Some instruction that is responsible for initialization of workers?

evdokim commented 10 years ago

@alberskib the first stack trace is not an error, it happens just because I didn't implement the code related to termination of writers, so just don't pay attention to it. the issue with bucket related to rights of the instance profile (role) of instance from thata you are running compota. @eparejatobes if you send me credentials for this account I will try to fix it. with serializes your are right, but I suggest you do not create new on, instead it you can wrap your data scla lists and case classes and it should work with JSONSerialzier

alberskib commented 10 years ago

Hello @evdokim. As you can see there I do exactly what you just say. Strange thing is that code for workers is not running on those machine. I mean when I run dynamograph-ddwriter from ec2 machine additional instances (for workers) will be launched but it seems that code on them is not executing.

evdokim commented 10 years ago

mmm can you connect to the worker instance with ssh (you can use your personal key pair name) and take a look on /root/log.txt ?

alberskib commented 10 years ago

Ok great! I wasn't aware that compota will store logs on worker nodes. Thanks a lot for help. In the logs I can find next error:

Complete!
download: s3://snapshots.era7.com/bio4j/ddwriter_2.10/0.1.0-SNAPSHOT/jars/ddwriter_2.10-fat.jar to ./ddwriter_2.10-fat.jar
Error: Invalid or corrupt jarfile /root/ddwriter_2.10-fat.jar

I will investigate it.

evdokim commented 10 years ago

this is a file file artifact, try to download it, probably you have a problem with publishing

alberskib commented 10 years ago

I think that the problem is not with permission for publishing because jar is present on s3 I can download it etc. But when I try to run java -jar ddwriter_2.10-fat.jar I am receiving the same error: Error: Invalid or corrupt jarfile /root/ddwriter_2.10-fat.jar. On the other hand when I run java -cp ddwriter_2.10-fat.jar [mainClassFromManist] program is running. For me problem is with manifest

alberskib commented 10 years ago

Hm I'm not sure but problem could be in next bug. Size of the archive is about 124MB

alberskib commented 10 years ago

Ok. My intuition is correct - after minimiazing dependencies everythis is working fine. In other word distribution writing is working well (Current serializer for PutItemRequest is just prototype). Thanks for help

evdokim commented 10 years ago

wow it's interesting because I never faced with constraint ))

alberskib commented 10 years ago

Me too. I heard about various constraint but never about the number of items inside jar archive. In fact this limit is really big - it is hard to exceed max number (if we do not merge all transitive dependencies). It occurs that scarph has com.thinkaurelius.titan" % "titan-all" % "0.4.4" as dependency. After removing it with transitive dependecies size of the archive fall to ab out 50 MB. I think that potential solution that will help avoiding such problems is to pack dependencies as whole jars not files from jars - jar with dependencies

bio4j / dynamograph

ddWriting #45