`/etc/machine-id` should not be inherited from templates

emanruse commented 10 months ago

Qubes OS release

4.1.2

Brief summary

Currently, all VMs based on a particular template inherit its /etc/machine-id, because it is persistent. This has privacy implications.

From machine-id documentation:

"This ID uniquely identifies the host. It should be considered "confidential", and must not be exposed in untrusted environments, in particular on the network. If a stable unique identifier that is tied to the machine is needed for some application, the machine ID or any part of it must not be used directly."

Steps to reproduce

cat /etc/machine-id in template and VMs using it.

Expected behavior

Qubes OS's templates are essentially golden images. As also described in systemd's documentation, "each instance should automatically acquire its own identifying credentials on first boot", i.e. /etc/machine-id must not be shared across qubes.

Actual behavior

All qubes based on a certain template have template's /etc/machine-id.

A simple and effective solution is to run this in the template:

touch /run/machine-id                                                           
ln -sfT /run/machine-id /etc/machine-id
sed -ri 's/#Storage=.*/Storage=volatile/g' /etc/systemd/journald.conf

After that, on each boot, the VM will have a new unique machine-id.

The last command ensures that journal will be volatile too (thus, not exercise unnecessary writes to SSDs). Related issue:

https://github.com/QubesOS/qubes-issues/issues/8832

jamke commented 10 months ago

I agree. Privacy in Qubes OS is very bad. On the positive side: huge room for improvement.

ben-grande commented 10 months ago

This was already explained in the FAQ: what-about-privacy-in-non-whonix-qubes.

Whonix provides a fixed machine-id for all users.

Machine-id is one identifier of many, there are many ways to fingerprint a VM. If Qubes starts focusing on these aspects, it will be redoing work already made by Whonix and take time away from developers that could be focusing on security issues.

emanruse commented 10 months ago

I am reporting this for Qubes OS. I am also showing what the original creators of systemd explain about machine-id.

Just because Whonix devs think/say something, does not automatically mean it is irrevocable and absolute. Tails (AFAIK) uses volatile machine-id.

marmarek commented 10 months ago

Tails (AFAIK) uses volatile machine-id.

Tails is focused (mostly) about privacy too. Standard Qubes VMs are not - as @ben-grande explained above.

That said, volatile machine-id will be a problem for StandaloneVM - where it should remain constant (and where also persistent journal makes sense). But everywhere else, indeed machine-id shared between AppVMs may be problematic. Maybe we can specify it via kernel cmdline (systemd.machine-id=) based on VM's UUID property (which is guaranteed to be unique, yet persistent)?

DemiMarie commented 10 months ago

Tails (AFAIK) uses volatile machine-id.

Tails is focused (mostly) about privacy too. Standard Qubes VMs are not - as @ben-grande explained above.

That said, volatile machine-id will be a problem for StandaloneVM - where it should remain constant (and where also persistent journal makes sense).

Should journal be persistant in some places but not others? It’s not too hard to make TemplateBasedVMs have a persistent journal.

But everywhere else, indeed machine-id shared between AppVMs may be problematic. Maybe we can specify it via kernel cmdline (systemd.machine-id=) based on VM's UUID property (which is guaranteed to be unique, yet persistent)?

What about renaming a qube or restoring it from backup?

marmarek commented 10 months ago

What about renaming a qube or restoring it from backup?

Both (currently) will result in a fresh UUID. But given those are rare events, I don't think it's a huge issue in practice.

marmarek commented 10 months ago

Should journal be persistant in some places but not others? It’s not too hard to make TemplateBasedVMs have a persistent journal.

Making journal persistent in TemplateBasedVMs may be useful in some cases too (but also, #830 ), but it isn't really topic of this issue.

emanruse commented 10 months ago

Standard Qubes VMs are not [focused on privacy]

It seems relevant to clarify some things, both for the sake of completeness and to avoid further confusion:

Project focus

From the homepage of Tails:

"Activists use Tails to hide their identities, avoid censorship, and communicate securely."

From the homepage of Whonix:

"As handy as an app - delivering maximum anonymity and security."

"Whonix runs like an app inside your operating system - keeping you safe and anonymous."

So, Tails and Whonix are actually focused on anonymity, although they say "privacy".

Privacy != Anonymity

Privacy is about data confidentiality. Anonymity (not having a name) is about hiding one's identity (in a way, meta-data confidentiality). The two things may be related but they are not equivalent.

Example 1: A bank account is private. It is not anonymous though. So, there is no goal to preserve anonymity during transactions.

Example 2: A whistle blower may need to be anonymous, although the result of his activity is public. The goal is to protect anonymity, not the privacy of the data.

Confidentiality is a component of data security:

https://en.wikipedia.org/wiki/Infosec#Confidentiality

and Qubes OS is security-focused. This makes it also confidentiality (privacy) focused. It provides actual mechanisms for securing it. Whonix does not provide that. It relies on existing Qube's and Tor's mechanisms for its goals and builds upon these existing systems.

How other mentioned projects handle machine-id

Whonix

By enforcing the same machine-id for every user, Whonix attempts to use a "hide in the crowd" approach (http://www.dds6qkxpwdeubwucdiaord2xgbbeyds25rbsgr73tbfpqpt4a6vjwsyd.onion/wiki/Protocol-Leak-Protection_and_Fingerprinting-Protection#Identifiers_Design_Goals), justifying it with:

"The Tor Project coined this Anonymity Loves Company (good web search term). Whonix attempts to be an extension of Tor. Therefore follows similar design principles."
Logic that quasi-identifiers (https://en.wikipedia.org/wiki/Quasi-identifier) (which they seem to call non-deterministic artifacts) can result in VM fingerprinting anyway.

There are several problems with that reasoning though:

Differential privacy is weak (https://en.wikipedia.org/wiki/Differential_privacy#Public_purpose_considerations), especially considering the obvious fact that Whonix users are a minority compared to all Tor users, compared to all other Internet users. I.e. hiding in a crowd makes sense only if the crowd is large enough.
The Whonix article says it is "realistically impossible" to disguise the fact that one is using Whonix. However, just because quasi-identification may be possible, does not mean it should be deliberately facilitated, neither it means that machine-id (which is considered confidential by design, especially in untrusted and networked environments), should be made deliberately public. This does not make the crowd larger.

In summary, neither the logic, nor the effect of it work for the actual project goal. This can be a long discussion and should be taken with Whonix devs. In Whonix forums there is at least one leading nowhere (http://forums.dds6qkxpwdeubwucdiaord2xgbbeyds25rbsgr73tbfpqpt4a6vjwsyd.onion/t/anonymize-etc-machine-id/7721).

Tails

An 8-year-old open issue:

https://gitlab.tails.boum.org/tails/tails/-/issues/7100

without resolution. They also seem to imagine some crowd.

That pretty much summarizes the situation with these so called privacy focused projects.

Maybe we can specify it via kernel cmdline (systemd.machine-id=) based on VM's UUID property (which is guaranteed to be unique, yet persistent)?

Which particular Qubes OS goal requires machine-id persistence?

I have it volatile in my templates (and hence qubes), it doesn't seem to cause any problems whatsoever. It is also easy to do, as explained.

marmarek commented 10 months ago

It seems relevant to clarify some things, both for the sake of completeness and to avoid further confusion:

Well, it's clearly stated in the FAQ already...

Which particular Qubes OS goal requires machine-id persistence?

For example I see some config files in user home are built based on machine-id (pulseaudio settings for example), if machine-id will change, those will a) not be correctly loaded and b) will accumulate in large number over time (for every machine-id). That's just one example, there are surely more.

emanruse commented 10 months ago

Well, it's clearly stated in the FAQ already...

What I explained is not stated in the FAQ. The FAQ section contains the same inaccurate implication that privacy and anonymity are the same thing. It also attempts to oppose privacy to security in one sentence, thus making it even more contradictory because confidentiality is essential part of security, while anonymity is not. This confusion is a separate issue itself.

For example I see some config files in user home are built based on machine-id (pulseaudio settings for example), if machine-id will change, those will a) not be correctly loaded and b) will accumulate in large number over time (for every machine-id).

Well, for pulseaudio I see those are generated on each machine-id change (i.e. on each reboot). Making ~/.config/pulse volatile solves b). I don't observe an issue with a), so persistence seems not required.

If machine-id is persistent in AppVM, that would make it persistent in disposables based on that AppVM, which would also contradict the paragraph quoted in the OP.

That's just one example, there are surely more.

Maybe we need to have a complete list to evaluate the actual effect of it.

We should probably note that machine-id is another systemd thing (by the "good" Red Hat who now explain us this anti-privacy feature should be confidential), so not using systemd would make it unnecessary. But I guess that's not an option.

marmarek commented 10 months ago

I don't observe an issue with a), so persistence seems not required.

Most likely because you don't change volume inside a qube. But users of sys-gui/sys-gui-gpu (which is a goal to make more common in further releases) will see it more commonly.

If machine-id is persistent in AppVM, that would make it persistent in disposables based on that AppVM, which would also contradict the paragraph quoted in the OP.

The method with using qube's UUID and kernel cmdline (or other way to transfer that UUID into machine-id) that won't be an issue, as each created disposable qube has fresh UUID.

Maybe we need to have a complete list to evaluate the actual effect of it.

Maybe. But IMO more productive approach is to focus on what machine-id should be, based on its specification. It specifies that:

It should be considered "confidential", and must not be exposed in untrusted environments, in particular on the network

which currently indeed is broken; but also it specifies that:

generated from a random source during system installation or first boot and stays constant for all subsequent boots

which would be broken if it's generated randomly on each start of a persistent qube (be it standalone, or template-based one).

emanruse commented 10 months ago

Most likely because you don't change volume inside a qube.

Yes.

But users of sys-gui/sys-gui-gpu (which is a goal to make more common in further releases) will see it more commonly.

I wish I could test this and provide feedback. Unfortunately, I am stuck with the GUI VM, unless someone explains how to proceed with this issue:

https://github.com/QubesOS/qubes-issues/issues/8657

As for UUID, thanks for explaining. I think you are right. That would match better the way it is supposed to work and won't be an issue for disposables.

DemiMarie commented 10 months ago

What about renaming a qube or restoring it from backup?

Both (currently) will result in a fresh UUID. But given those are rare events, I don't think it's a huge issue in practice.

Still, it will cause stuff to break, which isn’t awesome.

marmarek commented 10 months ago

Still, it will cause stuff to break, which isn’t awesome.

Preserving UUID across rename is probably fixable. For backup restore it's a bit more tricky (as you can restore a qube from a backup when having that qube present already; or restore it multiple times). But also, it isn't going to be much different from qube clone, which also would need to result in a new UUID (and to preserve the confidentiality of machine-id - machine-id too).

adrelanos commented 10 months ago

Need to consider the threat model. Which software is reading /etc/machine-id under which circumstances? Only locally running tracking software, which is either malware or software with anti-features. Until, what exactly, how we can call this... Until local fingerprinting protection gets implemented, it's best to avoid running such software even inside VMs. Such a feature ever getting invented however I called "realistically impossible".

(http://www.dds6qkxpwdeubwucdiaord2xgbbeyds25rbsgr73tbfpqpt4a6vjwsyd.onion/wiki/Protocol-Leak-Protection_and_Fingerprinting-Protection#Identifiers_Design_Goals), justifying it with: 1. "The Tor Project coined this Anonymity Loves Company (good web search term). Whonix attempts to be an extension of Tor. Therefore follows similar design principles." 2. Logic that quasi-identifiers (https://en.wikipedia.org/wiki/Quasi-identifier) (which they seem to call non-deterministic artifacts) can result in VM fingerprinting anyway.

There are several problems with that reasoning though: 1. Differential privacy is weak (https://en.wikipedia.org/wiki/Differential_privacy#Public_purpose_considerations), especially considering the obvious fact that Whonix users are a minority compared to all Tor users, compared to all other Internet users. I.e. hiding in a crowd makes sense only if the crowd is large enough.

Once tracking software is locally running under the mentioned threat model, it is much better for users to at least use a VM. Otherwise the tracking software can read hardware information and even hardware serial numbers.

If using Whonix, what is the least worse choice here? An /etc/machine-id that is shared among the minority to Whonix users or one that is unique? Locally running tracking software can find out under which operating system it is running anyhow (same for Windows, Debian, Qubes, Tails, Whonix, ...). Hiding this is again, realistically impossible.

Locally running tracking software can also trivially create is own locally unique identifier, generate a random number and write it to a file in the home folder to read it after reboot.

The Whonix article says it is "realistically impossible" to disguise the fact that one is using Whonix.

Right. Again, same for Windows, Debian, Qubes, Tails, Whonix, ...

We will also probably need to define the scope of your feature request. Naturally users won't care about the implementation specifics such as "/etc/machine-id". That's just a very specific technical implementation detail. To make sense of this, we probably need to define the user story in laymen language what users actually would like. I can imagine various things here...

1) "Qubes OS camouflage": Locally running tracking software cannot figure out it is being run inside Qubes.
2) "Debian OS camouflage": Running tracking software locally while the tracking software is unable to determine it is running in Debian, Kicksecure, Tails, Qubes or Whonix. All the tracking software can find out that it is running in some Debian based derivative.
3) "Linux camouflage": - The tracking software doesn't even know it is running inside Debian but just a generic Linux.
4) "Windows OS camouflage": Actually running some soft of Linux but running some Windows software securely in something like a Wine (fork) and the Windows software thinking it is actually running in Windows $current_version. This I am going to call even more impossible. Nothing is going to reliably appear as Windows without running the original Windows binaries which makes this a compromise between anti-local code execution fingerprinting versus security (avoiding to run Windows binaries).
anti anti-VM or `anti-VM bypass": VMs capable to hide the fact that these are VMs. Pretty much impossible without huge investments by virtualizers, Linux kernel. There are some search results for that. It seems like a rabbit hole, a cat and mouse game where anti-VM will always win against he defenders (anti-VM bypass). Proprietary software (anti-cheat, serial numbers checks), malware wants VM detect. Malware analysts and privacy enthusiasts want anti-VM bypass.

This issue is unspecific to Qubes, Debian, Tails, Whonix, etc. To my knowledge, there are no operating system which offer such feature. Even the terminology is non-existing. The awareness of this issue is non-existing. So if someone wanted to make progress with this topic, they would need to find/invent terminology, explain the issue and then draft feature requests sending to various projects or founding / funding the development of projects working on this.

However, just because quasi-identification may be possible, does not mean it should be deliberately facilitated, neither it means that machine-id (which is considered confidential by design, especially in untrusted and networked environments), should be made deliberately public.

deliberately facilitated? deliberately public?

Tails An 8-year-old open issue: https://gitlab.tails.boum.org/tails/tails/-/issues/7100 without resolution.

This is evidence how hard it is hard to find consensus for this topic. But please don't blame it on the pro privacy projects that tracking software is doing whatever it can to track users and that other upstream don't care about this issue. A mess created a thousands of people isn't trivially fixed by a handful of people.

There is research for anonymity anonbib - Selected Papers in Anonymity, but at time of writing I don't think there is any research related to local code execution anti-fingerprinting. Hence, it is difficult to reason about these things.

They also seem to imagine some crowd. That pretty much summarizes the situation with these so called privacy focused projects.

How exactly does the shared /etc/machine-id lead back to your real identity?

How do you call Tor? Also a so called privacy focused project because Tor doesn't even implement any local code execution anti-tracking?

Wondering under that viewpoint, do any real (not so called) privacy projects exist?

If you look at the Whonix history... In summary... It was hard to have a VM that reliably routes all traffic over Tor. Whonix solves that. How that's not an improvement? And also doing a lot of other stuff that is doable. But now you're shifting the goal post. Now you want to include a threat model where tracking software is running with local code execution.

And if that's not provided, you call it a "so called privacy focused project".

Check out CPUID. How you'll fix at least that?

Related link that were not referenced here yet:

https://www.whonix.org/wiki/VM_Fingerprinting

I interpret the Qubes FAQ, What about privacy in non-Whonix qubes? as preemptive rejection of such feature requests. It's not a stated project goal. It's even a deliberately excluded project goal. Qubes "only" wants to keep other VMs safe from each other. One VM where malware is running should be unable to read data from other VMs. Hence a compromised browsing VM cannot read the gpg private keys stored in a vault VM. Privacy what information locally running malware can gather by execution inside the VM is however not a stated project goal.

And I don't blame Qubes for that. Knowing how ridiculously difficult (speak expensive) it would be to implement this, it seems only natural to exclude unrealistic goals.

This isn't even a feature request so to speak. It's kinda a "project request".

I don't see this happening. Except, perhaps "money talks".

Maybe someone like Marek could estimate or at least guesstimate how much it would cost to implement any of the above mentioned features in work hours and/or monetary terms. But even asking for estimates might be unrealistic to expect an answer. The time needed to even do preliminary research and make the estimate for something also takes time. Which isn't likely if the end result is just finding that out with no further action realistically happening.

Why did I say "realistically impossible"? Well, for this to happen what would technically happen is changing the source files on other people's computers. That's not something I can easily do. Where? In upstream projects such as Linux, virtualizers, Debian, perhaps systemd. But they don't particularity care about my opinion. And that is fine and to be expected. There's thousands of people who want various stuff for them. Basically asking them to spend their life time. All for free or even against payment. So this issue needs to be explained, and patches that are acceptable to upstream need to be written. That is a slow crawling process and might hit a wall at some point because upstream doesn't care about this issue. Them either not seeing it as an issue, not important enough issue or not realistic to solve issue.

Maybe it could happen if a billionaire or millionaire such as Mark Shuttleworth showed up as he did when he founded Ubuntu with I don't know how many millions of USD. If that happens, yeah, maybe GNU Hurd or some other microkernel can be forked, getting a project goal enshrined of local code execution anti-fingerprinting, for Xen to add anti local fingerprinting etc. That's a very long shot and I find that unrealistic.

Disclaimer: This is my own opinion only. Not speaking for Qubes.

apparatius commented 10 months ago

I think the main concern with this issue that could be Qubes-specific is how to remove the unique fingerprints in the templates so the AppVMs based on the same template won't be undoubtedly linked. Of course even without these specific fingerprints it'll be possible to link them to some degree based on cpuid/installed packages/etc but this already won't be 100% sure link.

Here is an example: I'm using default Whonix templates for gateway and workstation without any modifications and keep them up to date timely. I have default Qubes OS whonix-ws-17-dvm that I use to start Tor Browser and visit websites. I want to keep two separate identities on github. To visit the github website I'm always starting Tor Browser in new disposable Whonix Workstation qube. Lets assume that github served me malicious javascript that ended up getting a local shell in the VMs where I separately visited my two github accounts. In the VMs the malicious software can look for unique template fingerptints like macine-id/template logs/etc and with 100% certainty determine that these two VMs are based on the same template and these identities are linked.

Now to circumvent this I can do this: Clone the default whonix-ws-17 template to be whonix-ws-17-general template. Remove default whonix-ws-17 template. Install the default whonix-ws-17 template from the Qubes OS repository, clone it to be whonix-ws-17-github1 template. Create disposable template out of whonix-ws-17-github1 template. Remove the default whonix-ws-17 template. Wait for some time so the creation date won't be too close. Install the default whonix-ws-17 template from the Qubes OS repository, clone it to be whonix-ws-17-github2 template. Create disposable template out of whonix-ws-17-github2 template. Remove the default whonix-ws-17 template. I must update the templates at different times to not linek them by update time. I also need to somehow name the disposable templates to not be linkable when malware will read the output of qubesdb-read /qubes-base-template from inside disposable VM.

This was written without much thought so maybe this will still leave some shared fingerprints in the templates.

renehoj commented 10 months ago

I think the main concern with this issue that could be Qubes-specific is how to remove the unique fingerprints in the templates so the AppVMs based on the same template won't be undoubtedly linked.

Does removing the machine-id in any meaningful way make it more difficult to link VMs, through the template?

The VMs would still be using the same root FS, the filenames and timestamps across the root FS is probably a unique fingering in itself.

You would also have data generated by the template when installing or updating software like dpkg.log, which would share the same timestamps across all VMs using the same template.

apparatius commented 10 months ago

Does removing the machine-id in any meaningful way make it more difficult to link VMs, through the template?

Removing just machine-id doesn't solve this issue of course.

The VMs would still be using the same root FS, the filenames and timestamps across the root FS is probably a unique fingering in itself.

If user doesn't make any changes to the default files in the template and only using system package manager to install/remove software in template then it's possible to use e.g. debugfs to reset all timestamps for the files to be the same as modification timestamp of the file that was during the creation of the package e.g.:

$ stat /etc/profile
  File: /etc/profile
  Size: 769         Blocks: 8          IO Block: 4096   regular file
Device: 202,3   Inode: 785014      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-08-15 02:12:46.124000000 +0000
Modify: 2021-04-10 20:00:00.000000000 +0000
Change: 2023-08-15 02:12:20.514000000 +0000
 Birth: 2023-08-15 02:12:20.514000000 +0000

You would also have data generated by the template when installing or updating software like dpkg.log, which would share the same timestamps across all VMs using the same template.

The logging in templates will need to be disabled or stored in private storage as template /home directory.

I don't know if it even possible to achieve this at all. Just some thoughts.

marmarek commented 10 months ago

To be clear, and also somehow echo what @adrelanos said: the privacy aspect of shared machine-id is not a focus for non-Whonix VMs. We are not going to duplicate the effort there. And also, just machine-id is a very small part, not even in top 10 (or top 100) things to avoid linking VMs based on the same template (and as Patrick said, accessing it requires local execution, at which point there are a lot more ways to fingerprint a template). In fact, having the same machine-id across several users (all using the same template version) might improve privacy...

The reason why making machine-id unique is considered in Qubes OS at all, is because having it shared may break some applications that assume it is unique and persistent. Its documentation suggests it may be used to derive application-specific unique identifiers and there may be applications relying on this feature. Something like this happened before (although it was about MAC address, not machine-id).

emanruse commented 10 months ago

Re. pulseaudio, mentioned earlier (again from machine-id's docs):

"If a stable unique identifier that is tied to the machine is needed for some application, the machine ID or any part of it must not be used directly. Instead the machine ID should be hashed with a cryptographic, keyed hash function, using a fixed, application-specific key."

The fact that pulse audio uses it directly is an issue with pulseaudio, i.e. unlikely something Qubes OS is supposed to take care of. They know about it:

https://gitlab.freedesktop.org/pulseaudio/pulseaudio/-/issues/1123

That principle seems applicable to all other software, i.e. we probably don't need a complete list.

@adrelanos

My previous reply was not intended to offend anyone or to start an extraneous discussion. It was a response to others who mentioned Whonix.

To avoid further off-topic, I will answer only the machine-id related things you mention in Qubes OS context. If you would like to discuss other things, please link to a relevant thread and we can do that.

Which software is reading /etc/machine-id under which circumstances?

I can open file:///etc/machine-id in both Firefox (Fedora) and Tor Browser (Whonix) (under no special circumstances), which means

it is not confidential
it is exposed in untrusted (networked) environment

Whether a browser extension or other JS can access it is at the mercy of the browser (sandboxing). As for other software, I have not investigated. The file is user readable. Even if it wasn't, in Qubes OS root is passwordless by default.

Only locally running tracking software, which is either malware or software with anti-features.

Or non-malware downloading and running arbitrary code (AKA JavaScript) that exploits bugs/vulnerabilities.

Until local fingerprinting protection gets implemented, it's best to avoid running such software even inside VMs. Such a feature ever getting invented however I called "realistically impossible".

I would glad to know how to avoid running JavaScript in JS-dependent forum or bug tracker, as well as why a well-known privacy invasive technology is deliberately chosen for privacy focused projects. Please link to where I can learn how to do that.

Locally running tracking software can find out under which operating system it is running anyhow (same for Windows, Debian, Qubes, Tails, Whonix, ...). Hiding this is again, realistically impossible.

Just because it is possible for locally running sophisticated malware to detect the OS (or that it is running in a VM), does not mean that:

all malware is sophisticated enough to do this
the OS should simply deliver a ready-made boot-resistant identifier in a well-known place, so every software can simply read it

deliberately facilitated?

Yes. Using the same machine-id for all users facilitates detection that "It is a this-OS user", thus not requiring any additional detection mechanisms from potential malware. This makes it possible even for simplest malware to find out. Not having a persistent machine-id would at least make it more difficult, thus reducing the probability of easy fingerprinting.

deliberately public?

Whonix's machine-id is public info. Qubes templates are also publicly accessible.

How exactly does the shared /etc/machine-id lead back to your real identity?

By reducing the noise in the system. Volatile parameters increase noise, making system identification more difficult.

Now you want to include a threat model where tracking software is running with local code execution.

It is not that I want that. It just seems part of distrusting the infrastructure (which includes pretty much everything except Xen, dom0 and the distro itself).

All for free or even against payment.

The same applies to bug reporters. :)

adrelanos commented 10 months ago

Of course even without these specific fingerprints it'll be possible to link them to some degree based on cpuid/installed packages/etc but this already won't be 100% sure link.

Coming first to mind, + screen resolution:

xrandr

How much percent certainty would be too much? 10%? 50%? I guess even a 10% certainty would be considered too much.

This was written without much thought so maybe this will still leave some shared fingerprints in the templates.

Would need to show the diff of the different VM images. Maybe using diffoscope.

Which software is reading /etc/machine-id under which circumstances? I can open file:///etc/machine-id in both Firefox (Fedora) and Tor Browser (Whonix) (under no special circumstances), which means - it is not confidential - it is exposed in untrusted (networked) environment Whether a browser extension or other JS can access it is at the mercy of the browser (sandboxing).

That would be quite catastrophic.

~/.mozilla/firefox/*.default-esr/cert9.db (browser password database)
~/.gnupg/secring.gpg (gnupg private keys)

Only locally running tracking software, which is either malware or software with anti-features. Or non-malware downloading and running arbitrary code (AKA JavaScript) that exploits bugs/vulnerabilities.

That's already covered by "malware".

Until local fingerprinting protection gets implemented, it's best to avoid running such software even inside VMs. Such a feature ever getting invented however I called "realistically impossible". I would glad to know how to avoid running JavaScript in JS-dependent forum or bug tracker, as well as why a well-known privacy invasive technology is deliberately chosen for privacy focused projects. Please link to where I can learn how to do that.

JS is pretty much off-topic. (If used for fingerprinting, that's remote fingerprinting, not local fingerprinting.) Here are some related links:

Just because it is possible for locally running sophisticated malware to detect the OS (or that it is running in a VM), does not mean that:

all malware is sophisticated enough to do this

This makes it possible even for simplest malware to find out.

Which malware at all looks at /etc/machine-id? Which malware is sophisticated enough to exploit a flaw the browser's JavaScript to then gain local execution to use /etc/machine-id for OS detection but at the same time being easily tricked by superficial OS camouflage?

What I mean to say, this is not a realistic threat model.

the OS should simply deliver a ready-made boot-resistant identifier in a well-known place, so every software can simply read it

There are way too many of these places.

/etc/debian_version
/etc/os-release
dpkg -l | grep qubes
which qubesdb-read
/usr/share/qubes/marker-vm
https://www.qubes-os.org/faq/#what-is-the-canonical-way-to-detect-qubes-vm
xrandr (output contains DUMMY which I only ever having seen in Qubes)

It would probably cost way less than 10000 USD to develop a library of reliable OS detection that can defeat superficial OS hiding attempts. /etc/machine-id wouldn't even have come to mind. Such stuff might even already exist Open Source.

This makes it possible even for simplest malware to find out.

The simplest malware doesn't run on Linux, doesn't do OS detection or attempt do detect Qubes. More sophisticated malware might use anti VM to avoid detection and analysis. Sophisticated, tailored malware against Qubes would probably use something like /usr/share/qubes/marker-vm (knowing the FAQ) or which qubesdb-read (if not reading the FAQ) to detect if being run inside Qubes.

Not having a persistent machine-id would at least make it more difficult, thus reducing the probability of easy fingerprinting.

Not really as this isn't the canonical way to detect Qubes and there are many other more simple, common ways to detect Qubes. So if you want anti-Qubes detection feature, I suggest opening a separate ticket (if this ticket wasn't clear enough).

Now you want to include a threat model where tracking software is running with local code execution. It is not that I want that.

/etc/machine-id obfuscation would be part of a OS camouflage feature.

Since the canonical way to detect Qubes VM is quote "Check /usr/share/qubes/marker-vm file existence", there is a supported way to detect Qubes that doesn't even need /etc/machine-id.

Unless Qubes chooses to implement anti-OS detection (which as seen in this ticket the answer apparently is "no"), I don't think it makes sense to modify /etc/machine-id. Changes to /etc/machine-id however might make sense for reasons other than privacy, I am not sure yet. (https://github.com/QubesOS/qubes-issues/issues/8833#issuecomment-1880069005)

emanruse commented 10 months ago

JS is pretty much off-topic.

So are xrandr, diffoscope, Whonix and what not.

What I mean to say, this is not a realistic threat model.

When you introduce an avalanche of questions and someone spends time to answer them, after which you swiftly brush away observable verifiable facts as "not realistic", there isn't much to say further.

adrelanos commented 10 months ago

I am now convinced that it would be better to have:

A) /etc/machine-id in App Qube; different from
B) /etc/machine-id in Template.

reasons:

1) When using journalctl in App Qube it is confusing that it contains the journal generated in Template.
2) Maybe there is even a security reason why App Qube should not know the journal from the Template?

Maybe we can specify it via kernel cmdline (systemd.machine-id=) based on VM's UUID property (which is guaranteed to be unique, yet persistent)?

Should work. I looked up the manual just now. Seem pretty clear.

https://www.freedesktop.org/software/systemd/man/latest/machine-id.html

The machine ID may be set, for example when network booting, with the systemd.machine_id= kernel command line parameter or by passing the option --machine-id= to systemd. An ID specified in this manner has higher priority and will be used instead of the ID stored in /etc/machine-id.

This gives nice flexibility to use different IDs for App Qubes vs Template.

On topic, (and please correct me if I am wrong):

xrandr: because in context of privacy from local code execution malware can be used to detect Qubes and fingerprint the VM.
diffiscope: because it is tool to compare two different VM images for purpose of reproducible builds or reducing file based VM fingerprinting.
Whonix: because of Qubes policy what-about-privacy-in-non-whonix-qubes (a link that Marek posted). The privacy aspect basically boils down to "maybe Whonix implements that but non-Whonix Qubes templates won't". And because Whonix unfortunately also realistically cannot implement OS hide from local code execution, and it's being discussed why not, I am explaining why. Since this feature request comes up every now and then, I will document the issues, challenges here: https://www.kicksecure.com/wiki/System_identity_camouflage

emanruse commented 10 months ago

I am now convinced that it would be better to have:

A) /etc/machine-id in App Qube; different from

B) /etc/machine-id in Template.

:)

On topic, (and please correct me if I am wrong):

The whole subject of fingerprinting is off-topic (although it is related and worth discussing separately). It is just too big to fit here (and implies many more issues). If you have a proper discussion thread about it, share a link to it. Maybe we can figure what can be improved.

adrelanos commented 10 months ago

I am now convinced that it would be better to have: A) /etc/machine-id in App Qube; different from B) /etc/machine-id in Template. :) On topic, (and please correct me if I am wrong): The whole subject of fingerprinting is off-topic

I brought up that topic because you mentioned in the original post here in context of privacy but that is only relevant in case of local code execution. When considering that however that opens up the full blown local code execution anti-fingerprinting discussion.

It however makes sense to ignore privacy in this ticket and only go for different machine IDs in Template vs App Qube for the purpose of not confusing systemd journal.

(although it is related and worth discussing separately).

Right.

It is just too big to fit here

Right. Such features would be best if described more generally, more thoroughly, requested more directly.

(and implies many more issues).

That's for sure.

If you have a proper discussion thread about it, share a link to it. Maybe we can figure what can be improved.

The ones I collected so far:

https://forum.qubes-os.org/t/how-to-hide-the-fact-that-im-qubes-os-from-telegram/22934
https://forums.whonix.org/t/qubes-identifiers/17994
Whonix forums, Qubes forums, Qubes issue tracker also have some discussions on CPUID, which is also very much related.

Other than that you could look at my feature requests descriptions on https://www.kicksecure.com/wiki/System_identity_camouflage, see what you agree with or not, rephrase and then post bug reports and/or feature requests against any responsible projects such as Linux, Xen, Debian, Fedora, ...

Feature request against Kicksecure, Whonix: Not needed. Above forum threads could be considered the feature request and "closed" as upstream issue / ecosystem issue, cannot fix. New comments can be added there.

Qubes feature request: After this ticket and

https://github.com/QubesOS/qubes-issues/issues/1142 (basically closed as wontfix)
https://github.com/QubesOS/qubes-issues/issues/7523 (closed as duplicate of above)
https://github.com/QubesOS/qubes-issues/issues/4980

I don't think any more tickets would be promising, would just be kinda duplicates, but that's just my opinion, not speaking for Qubes.

emanruse commented 10 months ago

I brought up that topic because you mentioned in the original post here in context of privacy

I only mentioned that it has privacy implications, i.e. for potential consideration in a broader context.

but that is only relevant in case of local code execution.

I wonder why you keep talking about this as if non-local one exists. Whether a file is downloaded and run, or JS runs inside browser - it is the local CPU that runs it.

It however makes sense to ignore privacy in this ticket and only go for different machine IDs in Template vs App Qube for the purpose of not confusing systemd journal.

Agreed.

I will look at the links later. Thanks.

adrelanos commented 10 months ago

I only mentioned that it has privacy implications, i.e. for potential consideration in a broader context. but that is only relevant in case of local code execution. I wonder why you keep talking about this as if non-local one exists. Whether a file is downloaded and run, or JS runs inside browser - it is the local CPU that runs it.

There's a strong and important boundary.

It's true that the local CPU runs it but the difference is two different concepts:

A) A browser parses a website and processes it according to the rules that developers have set, specification. By concept, no vulnerabilities being currently abused. Therefore to arbitrary code execution. The remote website cannot issue an arbitrary instructions "poweroff" or CPUID. This is considered "non-local". Websites can only make use of what browsers provide, HTML, CSS, canvas, webgl, whatnot but no arbitrary machine code / local execution.
B) Local code execution: A program runs locally and can arbitrarily run whatever command it wants (such as CPUID). It is executed on the hardware only and maybe subject to limitations may mandatory access control, kernel, virtualizer, whatnot.

JS run "remote" from a remote website in a local browser: Cannot read CPUID.

JS run locally (Node.js): Can read CPUID.

So local fingerprinting is a lot worse than remote fingerprinting, which is subject to browser restrictions. For example, at least there's no way to read CPUID remotely through a website (excluding vulnerabilities leading to remote code execution).

emanruse commented 10 months ago

https://leaky.page/

adrelanos commented 10 months ago

Spectre was a vulnerability that was hopefully fixed. It does not negate long established computer security concepts.

DemiMarie commented 10 months ago

Spectre was a vulnerability that was hopefully fixed. It does not negate long established computer security concepts.

You are too optimistic here, sadly. The correct fix for Spectre is Speculative Taint Tracking but no CPU vendor has implemented that.

emanruse commented 10 months ago

Spectre was a vulnerability that was hopefully fixed.

Strictly speaking, it was mitigated. A mitigation is not a fix. See for yourself:

lscpu | grep -i spectre

Consider also the fact that new side-channel vulnerabilities keep appearing every now and then.

QubesOS / qubes-issues