Closed alberskib closed 10 years ago
@evdokim bump
Hello @evdokim , @bio4j/dynamograph
I investigate further problems with ddwriting and it turns out that the problem is related to initilization. If I manually initiliaze workers and queues data are further processed. I have still problem with serializing as List[PutItemRequest] is not serializable by JSONSerialzier from compota but I think that as soon as I write my custom serializer for this type everything should work fine
By manual initialization I mean placing this part of code to the addTask function:
mapNispero.installWorker()
mapNispero.worker.runInstructions()
uploadNispero.installWorker()
uploadNispero.worker.runInstructions()
singleElements.init()
singleElements.initWrite()
It looks like compota does not invoke runInstructions command on workers. Maybe I am missing something in my code? Some instruction that is responsible for initialization of workers?
@alberskib the first stack trace is not an error, it happens just because I didn't implement the code related to termination of writers, so just don't pay attention to it. the issue with bucket related to rights of the instance profile (role) of instance from thata you are running compota. @eparejatobes if you send me credentials for this account I will try to fix it. with serializes your are right, but I suggest you do not create new on, instead it you can wrap your data scla lists and case classes and it should work with JSONSerialzier
Hello @evdokim. As you can see there I do exactly what you just say. Strange thing is that code for workers is not running on those machine. I mean when I run dynamograph-ddwriter from ec2 machine additional instances (for workers) will be launched but it seems that code on them is not executing.
mmm can you connect to the worker instance with ssh (you can use your personal key pair name) and take a look on /root/log.txt ?
Ok great! I wasn't aware that compota will store logs on worker nodes. Thanks a lot for help. In the logs I can find next error:
Complete!
download: s3://snapshots.era7.com/bio4j/ddwriter_2.10/0.1.0-SNAPSHOT/jars/ddwriter_2.10-fat.jar to ./ddwriter_2.10-fat.jar
Error: Invalid or corrupt jarfile /root/ddwriter_2.10-fat.jar
I will investigate it.
this is a file file artifact, try to download it, probably you have a problem with publishing
I think that the problem is not with permission for publishing because jar is present on s3 I can download it etc. But when I try to run java -jar ddwriter_2.10-fat.jar
I am receiving the same error:
Error: Invalid or corrupt jarfile /root/ddwriter_2.10-fat.jar
. On the other hand when I run java -cp ddwriter_2.10-fat.jar [mainClassFromManist]
program is running. For me problem is with manifest
Hm I'm not sure but problem could be in next bug. Size of the archive is about 124MB
Ok. My intuition is correct - after minimiazing dependencies everythis is working fine. In other word distribution writing is working well (Current serializer for PutItemRequest is just prototype). Thanks for help
wow it's interesting because I never faced with constraint ))
Me too. I heard about various constraint but never about the number of items inside jar archive. In fact this limit is really big - it is hard to exceed max number (if we do not merge all transitive dependencies). It occurs that scarph has com.thinkaurelius.titan" % "titan-all" % "0.4.4"
as dependency. After removing it with transitive dependecies size of the archive fall to ab out 50 MB.
I think that potential solution that will help avoiding such problems is to pack dependencies as whole jars not files from jars - jar with dependencies
Full log from running ddwriting"
With running compota-wordcount issue is related to the reading from other buckets:
But if I replace buckets configration to the bio4j I mean get: