Closed et304383 closed 9 years ago
@djoos everything OK on your end? It's not like you to not respond to issues. No sarcasm: I'm genuinely concerned.
Hi @eric-tucker,
thanks for getting in touch! I'm ok, thanks! Sorry: I've had quite a lot on at work and this one slipped through the net... :-/
Hmmm, I found a related issue here. You mentioned you spun up 2 fresh instances, so the "Can not create files table: disk I/O error" is most likely not down to running out of disk space, right? Any kind of (instance-related and also weird) limits AWS might be enforcing? (even so: it should be less random in that case and probably more clear-cut as to what is causing it)
An interesting one - looking forward to hearing from you!
Kind regards, David
[EDIT: updated the link to the issue on the AWS forum]
I saw that issue as well in my Google travels.
I see this happen still today randomly. We routinely destroy and recreate our test servers daily. Today, an instance failed to provision that yesterday using the same CloudFormation stack and Chef recipes successfully was provisioned (no yum error).
It only seems to happen from the newrelic install though - no other yum installations throw such an error. It could be a coincidence and this have nothing to do with newrelic though.
Ok, so it's also not provider-specific (nor a case of a full disk)... Do you guys use vagrant? If so, ever ran into the issue on a local vm as well? In the meantime, I'll reach out to New Relic support as well...
I'll keep you updated!
Kind regards, David
We do not use Vagrant, sorry. Hopefully something comes up on Newrelic's end!
Hi @eric-tucker,
so I've got a reply from a lead technical support engineer at New Relic...
"Thank you for reaching out. This is not something I have seen before. So I will escalate this to our LSM experts who should have a much better understanding of the issue." (continues below)
However, as we currently have some nodes with a slightly older version of the Linux server monitor - he picked upon this and asked to make sure we've got the latest installed. Would you be able to give me the version of the server monitor you've got installed?
"Is there any additional information you can share about your system which might be relevant? Would you be willing to give us a quick description of your environment?" Well, here is where it would digress from our setup obviously and I'm unable to share more specifics...
Perhaps it might be worth you opening an issue with New Relic? In that case I don't even mind you referencing my ticket number in, just so they can tie things together on their end in Zendesk: 158724. As far as I currently understand it, it is not a newrelic cookbook-related issue, however it obviously manifests itselfs when using the cookbook...
Thanks in advance for your feedback!
Kind regards, David
Well considering it's a fresh install using the latest 2.x newrelic cookbook I'd say I'm installing the latest monitor.
Berksfile:
cookbook "newrelic", "< 3.0.0"
The server is standard Amazon Linux server using the base Amazon AMI for 2015.03 Amazon Linux release.
This might help (a snippet of our recipe doing the yum operations):
if ["amazon", "centos", "redhat"].include?(node[:platform])
include_recipe "yum-epel"
end
#Fix for non 8 GB root volumes. Ignore any errors as this is just a convenience call.
execute "resize2fs /dev/xvda1 2>/dev/null"
include_recipe "codedeploy-agent"
#Fix for codedeploy-agent startup script not checking that the PID in the PID file is
#actually a codedeploy agent process
cookbook_file "/opt/codedeploy-agent/bin/codedeploy-agent-pid-fix.sh" do
mode 0755
end
cron "codedeploy-agent-pid-fix" do
minute "*/15"
command "/opt/codedeploy-agent/bin/codedeploy-agent-pid-fix.sh"
end
package "htop"
unless node[:newrelic][:license].nil?
include_recipe "newrelic::server_monitor_agent"
end
So as far as yum operations it looks like I'm enabling the yum epel, installing htop, then moving to the server monitor recipe.
I could open a ticket with newrelic if you're confident it's not the cookbook causing the issue.
I've got another update through on the ticket...
"The error message shown looks to be a problem specific to the sqlitecachec Python package on the user's systems when attempting to download, cache, and actually install the newrelic package through yum (through Chef). Given that the consistent error in all of these failed Chef command executions is:
TypeError: Can not create files table: disk I/O error I can only imagine that something on the filesystem (or something about the filesystem/database - out of space, bad permissions, inability to access a yum caching database, etc.) is resulting in the issue. In either case, it doesn't look to be anything specific to New Relic or our services.
That being said, we're unable to provide any official support on this use case, but a few cursory things to check might be to ensure that the version of the sqlitecachec package is up to date on the machine image in use, ensuring whichever SQLite database used by yum won't run into any of the issues described above (permissions, etc.), and anything else related to yum interacting with any backing SQLite stores."
Sorry @eric-tucker - didn't want to just close and comment, but was handling this update I literally just got through on my mobile... (fat fingers)
Any further luck troubleshooting this one @eric-tucker?
Kind regards, David
No, but considering it's not related to this cookbook (though it appears as the only place it manifests) you can close this and I'll take into consideration the advice from Newrelic. I'll open a ticket with them if I need further assistance.
Perfect, thanks @eric-tucker!
Kind regards, David
@eric-tucker, I'm hitting the same exact issue with our cookbook and installing New Relic. Did the advice help?
Just came across this same issue. Wasn't able to get to the bottom of it...but it's working again for now.
This is only happening for the newrelic yum install so I'm not convinced it's a widespread yum race condition issue. I'm using the latest version of this cookbook.
I spun up two fresh EC2 instances today using the same run list and one failed randomly. I've seen this numerous times (from newrelic install) and it's very difficult to debug. Here's the relevant stack trace: