cloyne / network

5 stars 5 forks source link

/dev/sdc hard drive on server2 failed #110

Closed mitar closed 7 years ago

mitar commented 7 years ago

It is OK because that was a RAID1 with three drives:

md1 : active raid1 sdc1[0](F) sdb1[1] sda1[2]
      1465006080 blocks super 1.2 [3/2] [_UU]
      bitmap: 7/11 pages [28KB], 65536KB chunk

We should just probably take the disk out and removes it from the RAID array so that the server stops complaining. Maybe there is some replacement hard drive of same size/type (1.5 TB) somewhere in the network closet. In that case we could replace it. But if not, also not a problem.

Throw a disk away so that it does not confuse next network manager.

ck2qsuZT commented 7 years ago

Which bay number

On Apr 29, 2017 02:50, "Mitar" notifications@github.com wrote:

It is OK because that was a RAID1 with three drives:

md1 : active raid1 sdc10 sdb1[1] sda1[2] 1465006080 blocks super 1.2 [3/2] [_UU] bitmap: 7/11 pages [28KB], 65536KB chunk

We should just probably take the disk out and removes it from the RAID array so that the server stops complaining. Maybe there is some replacement hard drive of same size/type (1.5 TB) somewhere in the network closet. In that case we could replace it. But if not, also not a problem.

Throw a disk away so that it does not confuse next network manager.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cloyne/network/issues/110, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwZSqHs3WFvZGekMcTH7Qrm2JUJc5Liks5r0wfsgaJpZM4NMNNm .

mitar commented 7 years ago

No idea. :-)

mitar commented 7 years ago

You can see in e-mails you got from SMART its serial number. So check that. Turn off the server and check which hard drive has the serial number.

mitar commented 7 years ago

SERVER 2 not 3!

ck2qsuZT commented 7 years ago

Oh, everyone's asleep though

On Apr 29, 2017 03:03, "Mitar" notifications@github.com wrote:

SERVER 2 not 3!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cloyne/network/issues/110#issuecomment-298159630, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwZSoEllyWkxIFf4ESdH1fGpaPPUXE4ks5r0wr-gaJpZM4NMNNm .

mitar commented 7 years ago

So? Why server 3 went down? You should be fixing server 2.

ck2qsuZT commented 7 years ago

I made mistake :(

On Apr 29, 2017 03:07, "Mitar" notifications@github.com wrote:

So? Why server 3 went down? You should be fixing server 2.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cloyne/network/issues/110#issuecomment-298159779, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwZSt4XUVn3O6R0wpYrjGvab_zjzw4Zks5r0wvFgaJpZM4NMNNm .

mitar commented 7 years ago

I thought so. :-) I just didn't understand what has sleeping of others with your mistake. :-)

ck2qsuZT commented 7 years ago

An excuse I guess, not a very productive thing to do all the time

On Apr 29, 2017 03:11, "Mitar" notifications@github.com wrote:

I thought so. :-) I just didn't understand what has sleeping of others with your mistake. :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cloyne/network/issues/110#issuecomment-298159971, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwZSqPa5OYN7F6IEHVc6YB87qO7-y2nks5r0wzGgaJpZM4NMNNm .

mitar commented 7 years ago

Oh, this things happen. :-)

ck2qsuZT commented 7 years ago

All good?

On Apr 29, 2017 03:15, "Mitar" notifications@github.com wrote:

Oh, this things happen. :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cloyne/network/issues/110#issuecomment-298160157, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwZSkAuqyrMyb3BIDAvpf9gs-RFaIZYks5r0w2ygaJpZM4NMNNm .

mitar commented 7 years ago

Did you dispose of the hard drive? Have you found any replacement drive? If not, you should now to reconfigure the failed raid to have only 2 drives.

Also, now server3 is complaining about a failed drive in bay 8. Did you put a failed drive into it?

ck2qsuZT commented 7 years ago

What's to prevent other managers from doing what I want to (not what I'm actually going to do) and just filtering out the emails? To be fair, I've mentally filtered them out a bit

mitar commented 7 years ago

Ehm, the idea of automation is to make your work easier. :-) Not harder. If there are issues, you fix them because an automatic system tells you so, instead of waiting for something to crash so hard that you have to spend much longer to fix it (like having to reinstall everything). To me it was a way to make me do less work, while having the same or more efficiency. This is I think the only manager position where you can have that. Tools telling you when something is going wrong, so that you can sleep otherwise and enjoy not doing anything.

Anyway, can you fix those things? Check if we have replacement hard drive for server2, or reconfigure RAID-1 (do not loose data, just change number ob disks). And in server3 remove the failing drive you seems to put in? Or is this another drive failing in server3?

ck2qsuZT commented 7 years ago

Make it easy enough for a monkey/robot to use :)

"here push this button and the machine will stop yelling at you"

or, in this case

"pull out this physical device, put in a backup, order a now backup, if you want more information then here (link) if not then you can carry on in life."

I can read the github and figure it out, not all network managers have/will be capable of that

This would require a discretionary budget every semester for hard drives but $200 a semester is probably reasonable and sustainable. It could even potentially be centralized if the reason they're not now is legal.

I accidentally pushed the drive back in and lost it. Is there anyway to change the email to reflect the actual physical bay numbers and to not require that you ssh in to ask where the issue physical issue is?

mitar commented 7 years ago

I can read the github and figure it out, not all network managers have/will be capable of that

Then don't elect them.

mitar commented 7 years ago

I accidentally pushed the drive back in and lost it. Is there anyway to change the email to reflect the actual physical bay numbers and to not require that you ssh in to ask where the issue physical issue is?

You can program it. :-)

mitar commented 7 years ago

It is described in README how you can compute it.

Why you pushed it in? Then remove it from the bay altogether now. So that it is empty bay (without drive in, just plastic holder).

ck2qsuZT commented 7 years ago

"Then don't elect them."

Hard to do when noone is competent :-p something is better than nothing right? There's also the matter of elected officials lying because there is no background check

"You can program it. :-)"

I accept your challenge

"Why you pushed it in? Then remove it from the bay altogether now. So that it is empty bay (without drive in, just plastic holder)."

Why didn't you do that ;-)

mitar commented 7 years ago

Why didn't you do that ;-)

It seems I forgot. This is really not good. I should have thrown it away immediately. There should be no broken hard drives around.

So please remove it, and throw it away.

Have you found a replacement for server2 disk? If not, no worries, just reduce RAID-1. You do not have to buy anything. But maybe there is a 1.5 TB disk laying around.

Hard to do when noone is competent :-p something is better than nothing right?

Not true. Then you can delegate to a central level, for example.

ck2qsuZT commented 7 years ago

Not sure how to throw the server3 disk away without loosing the screws

I didn't find a replacement drive but I've shrunk the server2 array already

owncloud throws a 502 nginx error now, not sure if this is related at all

ck2qsuZT commented 7 years ago

nevermind on the 502, it went away

mitar commented 7 years ago

nevermind on the 502, it went away

Yes, Docker takes some time to detect all child containers.

mitar commented 7 years ago

Not sure how to throw the server3 disk away without loosing the screws

Are they any special screws? I think they are just standard screws, so you can store them with others.

Or, you take they yellow ducktape and stick them on the internal of the enclosure.

I didn't find a replacement drive but I've shrunk the server2 array already

Have you updated the README?

ck2qsuZT commented 7 years ago

Done, I'll upload some useful photos and close the request later.

ck2qsuZT commented 7 years ago

img_20170506_095635 img_20170506_095643 img_20170506_095704

ck2qsuZT commented 7 years ago

Empty screws inside smirk smirk

mitar commented 7 years ago

Looks great!

ck2qsuZT commented 7 years ago

Restricts air flow a bit but probably not significantly