Closed nhartwic closed 2 years ago
What Python version are you using? Note in the output you provided:
2.0.0 Requires-Python >=3.7,<3.9
I assume you are still on 3.6, which is past EOL. Try using a newer Python version? Also, the job log may be helpful here in deciphering the issue.
In my case, the issue was my python version was too new.
Upgrading didn't resolve the plot_metrics issue. Here is a new plot_metrics output...
https://salk-tm-logs.s3.amazonaws.com/aA6bccyAeLQt.metrics/metrics.html
Seems to me like the issue is that your job is too short to extract the metrics for those plots from CloudWatch, and if it were longer you would see the plots. Odds are since it is very close to the CW reporting delta, you may not see any data points over that time frame (or the job finishes and makes the API call before CW has processed the data point and made it available). See these functions that pull that data. My guess is those API calls are returning no data points due to a very short job. You could write a small script that makes such calls and run them simultaneously while your job executes to verify.
With that said, if you go into CloudWatch and do in fact see metrics over the job time frame, we will look into it further. But it could also be the case that the data point (probably singular) just comes in too late for Tibanna to catch it. If you add a 15 minute sleep at the end of your job, I suspect the metrics will report as expected.
Ok, this job ran for an hour:
https://salk-tm-logs.s3.amazonaws.com/eTcBZv8sWlB8.metrics/metrics.html
...Here are all the "log_files" that were uploaed to s3 and are associated with the job...
...Cloudwatch logs associated with the job below...
Will, the cpu usage is coming directly from Cloudwatch whereas the rest is sent from the instance. Looks like the Perl script that sends these metrics no longer works. It could be a problem on the AWS side since they were planning to deprecate it.
Do you have this issue on your CGAP/4DN runs?
On Tue, Jul 26, 2022, 6:20 PM nhartwic @.***> wrote:
Ok, this job ran for an hour:
https://salk-tm-logs.s3.amazonaws.com/eTcBZv8sWlB8.metrics/metrics.html
...Here are all the "log_files" that were uploaed to s3 and are associated with the job...
job_eTcBZv8sWlB8.tar.gz https://github.com/4dn-dcic/tibanna/files/9193659/job_eTcBZv8sWlB8.tar.gz
...Cloudwatch logs associated with the job below...
log-events-viewer-result.eTcBZv8sWlB8.csv https://github.com/4dn-dcic/tibanna/files/9193664/log-events-viewer-result.eTcBZv8sWlB8.csv
— Reply to this email directly, view it on GitHub https://github.com/4dn-dcic/tibanna/issues/367#issuecomment-1196037820, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHLO3FFB56A77SEI3KOYMTVWBQCFANCNFSM54UCIOQA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Not 100% sure what you are referring to but I don't think I use GCAP/4DN.
I assume tibanna uses some specific image when spinning up ec2 instances that includes tibanna and dependencies. If the issue is with perl, it seems like the sollution is updating that image?
We are developing Tibanna for the CGAP/4DN projects.
This is a strange issue. I checked a few recent 4DN runs and the metrics reports look ok. However, I found one where it is incomplete as well:
https://tibanna-output.s3.amazonaws.com/BuYiLLidEFPL.metrics/metrics.html (complete)
https://tibanna-output.s3.amazonaws.com/lDZmmG1vMVnT.metrics/metrics.html (incomplete)
Same workflow, same machine, both ran yesterday with the latest version of Tibanna for more than 5 hours, but the second one has an empty metrics.tsv
and metrics_report.tsv
.
It is possible there are multiple issues happening here. I haven't seen complete metrics on any of my jobs that I've checked. I'm not sure how many I've checked, but at least a dozen. I'm about to kick off a bunch of assemblies again so I'll check more jobs soonish.
Thanks for bearing with us on this. I can confirm we are also seeing this issue on our jobs (of all lengths). We're looking into it and will hopefully have a resolution soon.
Basically the title. My plot_metrics calls fail to produce the main figures. For reference, here is a recent "plot_metrics" result...
https://salk-tm-logs.s3.amazonaws.com/1JYcXdBkgBlp.metrics/metrics.html
I'm currently on snakemake 7.9.0 and tibanna 1.9.2. I have tried to upgrade to tibanna 2.0.0 but something seems to be wrong with pip. See error below...