c0xc / CapacityTester

Use CapacityTester to check if your USB thumb/flash drive lies about its capacity. Graphical tool to detect fake USB drives.
GNU General Public License v3.0
108 stars 14 forks source link

[Feature Request] Ability to specify raw device #14

Closed OdinVex closed 8 months ago

OdinVex commented 11 months ago

This tool doesn't detect /dev/ devices, such as /dev/hda or sda or nvme, quite limiting. The Chinese do a lot of fake drives and I wouldn't want anyone to get scammed by fake drives, such as fake SSDs or NVMes. (Also tested with -test /dev/nvme0n1, keeps thinking 'volume' rather than device. I don't know anyone that would want to test on volumes, seems a bit...lacking compared to being able to test a device.)

c0xc commented 11 months ago

Thanks for your comment - interesting idea. Actually, I've added the new "disk test" feature recently, which can and should be used instead of the volume test (which works on mounted filesystems only). So the disk test is what you're after, except for two things:

  1. I'm only displaying USB devices at the moment, ignoring all other devices.
  2. This new test feature was made to test fake USB drives fast and it's fast because it does not check every single byte (currently it checks blocks with mb intervals). For fake SSDs, this type of test should work in the same way, but I've never had a fake SSD so I haven't checked.

As for the first point, I guess I could just add an option to disable the USB filter...

OdinVex commented 11 months ago

You could add an option to disable the USB filter, certainly it'd allow people to test other disk drives. As for skipping space during checks...that should probably be off by default and the user informed if it is kept on by default...because it isn't a genuine test of capacity, then. For now, I wrote a script to write unique data to each sector of a suspected drive and then read it all back to compare against the unique data (a large hash based on a unique key and sector index). Slower, but genuine, real.

Edit: I don't know if your option is on or off by default, I couldn't get that far to test things with CT.

c0xc commented 11 months ago

I think I'll add an option to disable the USB filter, being able to test an SSD sounds useful. I should probably add some sort of check or extra warning so that users won't accidentally wipe the wrong disk.

As for the testing algorithm - my original idea was to write this new disk test feature with the option to select an algorithm. But at some point I realized that all fake drives (that I know of) work in the same way, the first couple of gb are always usable and it seems they only differ in how they misbehave when writing beyond the real capacity. So for now, this feature only has this one algorithm which is meant to find the boundary, the position after which write requests are either ignored or previously written data is overwritten. It works reliable with all fakes that I've checked and I think it's more efficient because those bytes inbetween do not make a difference. If you are aware of a completely different type of fake drive where this test does not work, please do let me know.

That being said, I understand that a user might want to test every single byte. It would be another purpose, but legitimate. So I'll consider that.

But to say the obvious: Testing every byte will be too slow. The old volume test feature (which requires a mounted filesystem) already does that. In fact, I do have a fake SSD which claims to have 4TB of storage. As far as I remember, it actually contains a tiny SD card, it's even slower than other fake drives. I think at one point I tried to fill it completely up and I gave up after two weeks. I think I didn't even fill the real capacity because it was so slow. So when I wrote the new disk test feature, I didn't see the use case for testing every single byte. But that's just my opinion. If I add another testing algorithm "all bytes included", it would certainly not be the default setting because of the slowness, I don't want users to give up when they see how slow it really is.

OdinVex commented 11 months ago

I should probably add some sort of check or extra warning so that users won't accidentally wipe the wrong disk.

Better safe than sorry.

If you are aware

You must allocate the minimum required verification of capacity, there are plenty of techniques to work around simpler detection techniques. For instance a drive can just assume write #1 aiming at Sector 0 and write #2 aiming at sector 1024 are actually sector's 0 and 1, right next to one another. Looking now for sector 512? It's now physically located after 1024's sector at sector 2. Works until a minimum size, regardless of boundaries. Eventually wraps around, or is modulus stores to the maximum capacity. I've come across a Chinese site somewhere around 2014 mentioning techniques that were being discussed on how to detect this because they too were getting sold fakes in batches. They ended up with the same wisdom, allocate at least as much space as you want verified with unique per-sector data that can be predictably checked based on unique key hashing. *shrug*

I didn't even fill the real capacity because it was so slow. So when I wrote the new disk test feature, I didn't see the use case for testing every single byte.

I found the bottleneck to the loop to be the hash generation. A typical SSD can maintain 350-550MB/s write and reads. Most modern CPUs can use hardware-accelerated AES, several GB/s in many cases. Using a simple X-byte key that is unique to each sector (maybe includes sector index somewhere) and feeding that into the hwaccel-AES could give you enough bytes to quickly raw-write the entire drive reasonably near max drive speed. Read should be just as easy, just remembering to write all, then read.

Edit: Take into account the optimal buffer size too. Some drives like 512, some 4K, some 1M-16M (NVMe mostly I've come across). That made a several-hundred fold increase on my own write speed, but I wrote mine in bash and had to pipe sha1sum, bottlenecked at the algorithm because I should be doing it in C/C++.

c0xc commented 11 months ago

You must allocate the minimum required verification of capacity...

How would you define this minimum?

They ended up with the same wisdom, allocate at least as much space as you want verified ...

Are you suggesting the user should be able to select how much space s/he wants to have verified? Say, your fake drive claims to have 4T but you get asked how much to verify and type in "64G"?

I found the bottleneck to the loop to be the hash generation

Okay, but that wasn't the bottleneck with the fake "SSD" I mentioned. It was the i/o (that was stalling), not the cpu. Again, it ran for about two weeks with speeds under 1 MB/s, that's why I mentioned it - nobody would wait for that to finish.

And as for block sizes, I'm already doing direct i/o, so all writes fill exactly one block. That was part of my effort to solve the main problem which is caching.

OdinVex commented 11 months ago

How would you define this minimum?

If it says it is 4TB, you must write all 4TB.

Are you suggesting the user should be able to select how much space s/he wants to have verified? Say, your fake drive claims to have 4T but you get asked how much to verify and type in "64G"?

No, see above.

Okay, but that wasn't the bottleneck with the fake "SSD" I mentioned. It was the i/o (that was stalling), not the cpu. Again, it ran for about two weeks with speeds under 1 MB/s, that's why I mentioned it - nobody would wait for that to finish.

??? I was speaking anecdotally about the script I wrote having a bottleneck around bash shell launching sha1sum...

And as for block sizes, I'm already doing direct i/o, so all writes fill exactly one block. That was part of my effort to solve the main problem which is caching.

If you're familiar with dd, I was talking about a -bs=4K example being better than -bs=1 and allowing users to specify bigger writes to maximize IO.

c0xc commented 11 months ago

If it says it is 4TB, you must write all 4TB.

Got it. I'm planning to add that option.

I was speaking anecdotally about the script I wrote ...

Sorry, I read that wrong.

I was talking about a -bs=4K example being better than -bs=1

Sure, writing 1 byte "blocks" (i.e., rewriting a block to change one byte) would be terribly inefficient. As I said, I'm already using direct i/o which requires writing full blocks, in other words that's what I'm doing already. But thanks for the hint anyway.

OdinVex commented 11 months ago

Sure, writing 1 byte "blocks" (i.e., rewriting a block to change one byte) would be terribly inefficient. As I said, I'm already using direct i/o which requires writing full blocks, in other words that's what I'm doing already. But thanks for the hint anyway.

I know you said you were using direct I/O, I was talking about making sure any algorithm you use to generate the unique data should be fast enough to keep up with the maximum speed of the drive. Eg., if an algorithm can only give 32MB/s output (bash shell launching such as sha1sum can be terribly slow), it isn't utilizing an SSD's ~350-560MB write max (decent ones). Just pointing out to make sure that the algorithm you use should be fast enough to not be the bottleneck like my preliminary script was. I ended up implementing it in C++ and it drastically increased my speeds, I can cap things out and instead of taking weeks for a simple 1TB, it takes < 10 minutes (NVMe PCIe, ~1.5GB/s average flushed).

c0xc commented 10 months ago

to generate the unique data should be fast enough to keep up with the maximum speed of the drive

Good point. All fake drives I've had were so slow that I didn't have to think about a possible cpu bottleneck, but if you want to run a full test on a genuine SSD, it would matter...

Thanks for your ideas, I really appreciate it!

OdinVex commented 10 months ago

to generate the unique data should be fast enough to keep up with the maximum speed of the drive

Good point. All fake drives I've had were so slow that I didn't have to think about a possible cpu bottleneck, but if you want to run a full test on a genuine SSD, it would matter...

Thanks for your ideas, I really appreciate it!

You are most welcome, thank you for creating this nifty tool. :)

c0xc commented 10 months ago

I've been playing around with it.

Screenshot_20230827_003837

It's still lacking a reliable check if the selected storage device is mounted (there's a superficial check though), but I ran out of time. I think I'll make a new release soon.

OdinVex commented 10 months ago

the selected storage device is mounted

When you say that, do you mean /dev/... "mounted" or do you mean volumes? I'd prefer writing low-level such as /dev/sd[a-z] directly without even bothering with partitions.

c0xc commented 10 months ago

When you say that, do you mean /dev/... "mounted" or do you mean volumes? I'd prefer writing low-level such as /dev/sd[a-z] directly without even bothering with partitions.

I wasn't clear, I meant storage devices, like /dev/sd[a-z]+. In the next release, you will be able to select any block/storage device and run a full test on it. What I meant was the safety check that checks if the selected device contains a partition (like /dev/sda2) with a filesystem that's mounted. This check isn't very reliable yet, but should work in most cases. The test that follows does not care about those partitions, it'll overwrite everything.

OdinVex commented 10 months ago

What I meant was the safety check that checks if the selected device contains a partition (like /dev/sda2) with a filesystem that's mounted.

And what does it do with that information? Require a partition or warn a user partition(s) will be overwrote? Should warn without bothering to check for partitions. "Any/all data on this drive will be overwrote." Some people use encrypted volumes raw on a device, won't show any partitions at all.

c0xc commented 10 months ago

And what does it do with that information? Require a partition or warn a user partition(s) will be overwrote?

The latter - it will show a warning. It's just a safeguard before the start to prevent the user from wiping the wrong disk. The disk test routine itself does not check for partitions.

c0xc commented 10 months ago

So I'm releasing v0.6, which includes a full disk test and the option to select any non-USB block storage device. Let me know if it works for you.

To show that both disk test options work, here's a screenshot showing the new full disk test which took 66 hours (almost 3 days) for a USB stick that claims to have 1 TB of storage:

Screenshot_20230901_215438

And here's the quick disk test which took less than 10 seconds to check the same USB stick. The numbers differ slightly because of a rounding error since the quick test rounds to the next GB.

Screenshot_20230902_201441

OdinVex commented 7 months ago

So I'm releasing v0.6, which includes a full disk test and the option to select any non-USB block storage device. Let me know if it works for you.

I don't see a full-disk test option, only "All mountpoints/All filesystems/USB filesystems". Edit: I found the option (it should not be in the menu, it should be under Select Drive... Also, the font selection on that dialog sores my eyes a little. The name is also wonky, "Generic-MassStorageClass-DEVID". Might need to double-check where to get Name and Vendor from. Edit: After entering a password to begin, the dialog stays open while the main window is stuck behind it showing actual information. You also shouldn't say "is genuine", you should say "appears to be genuine".

c0xc commented 7 months ago

Thanks for your feedback!

UI: I know I have to fix the window layout. For now, I wanted to have a working version to be able to use the new disk test routine. I have noticed some things I could improve but I first wanted to get the new test routine to a working state. Since it requires different options than the old volume test (main window, select drive etc.), I created that wizard-like dialog but then I put it in the "advanced" menu and used the progress bars in the main window because that was the easiest workaround I came up with at the time, short of redesigning the view and like I said, I wanted to make it functional first and improve the ui later. I am aware that it looks a bit awkward now, it's not supposed to stay like that. The disk selection is not under "select drive" because that's the volume selection (filesystems). I think instead of the old main window, I'll first display a main wizard, something like "do you want to run ... a) volume test b) disk test c) format drive", then there'll be no confusion anymore.

Yeah, some devices report no real id/serial, then you'll get something like "DEVID". I'll use another name for the list entries, probably something like what I'm already displaying in the "name" field.

Thanks for the hint about the wording. I agree, "appears to be" sounds better, will be fixed.

I appreciate all those hints about the design but have you had a chance to try the full disk test with a known fake/faulty disk and did you get the expected result?

OdinVex commented 7 months ago

Yeah, some devices report no real id/serial, then you'll get something like "DEVID". I'll use another name for the list entries, probably something like what I'm already displaying in the "name" field.

Mine does have a Manufacturer and Model shown in DMESG. Something is amiss, not sure what though.

I appreciate all those hints about the design but have you had a chance to try the full disk test with a known fake/faulty disk and did you get the expected result?

I don't have any faulty disks or anything, I threw the one fake out ages ago. I did do a full-disk check on my one SD card that I know is legitimate, it works fine, but I don't have any fakes to test with.