Teknomancer / sysprocalc

Command-line expression-evaluator and x86 register descriptions
Apache License 2.0
0 stars 1 forks source link

Add support for register descriptions #3

Open Teknomancer opened 4 years ago

Teknomancer commented 4 years ago

RFE for a register descriptor (a more generic x86-register formatter).

Proposal:

  1. The register descriptor is intended to describe range of bits.
  2. Allow specifying byte order - big or little endian.
  3. Allow 8 or multiple of 8 bits data.
  4. Consider loading a bit format from a TOML so we need not recompile every time we add new registers.
  5. Some basic registers must be in-built (e.g, all the x86 ones like CR4, EFER etc).
  6. Allow specifying device registers (e.g, APIC registers, AMD IOMMU registers etc).

Sample register descriptor TOMLs:

# EFER
arch = "x86"
device = "cpu"
name = "EFER"
description = "Extended Feature Register"
byteorder = "little-endian"
bitcount = 64
bitspans = [
  { first=14, last=14, attr=rw, name="FFXSR", short="Fast FXSAVE/FXRSTOR", long="Fast FXSAVE/FXRSTOR support" },
  { first=13, last=13, attr=rw, name="LMSLE", short="LMSL Enable",         long="Long-mode segment limit enable (AMD)" },
  { first=12, last=12, attr=rw, name="SVME",  short="SVME Enable",         long="Secure Virtual Machine enable (AMD)" },
  { first=11, last=11, attr=rw, name="NXE",   short="NX Enable",           long="No-execute enable" },
  { first=10, last=10, attr=ro, name="LMA",   short="LM Active",           long="Long-mode active" },
  { first=8,  last=8,  attr=rw, name="LME",   short="LM Enable",           long="Long-mode enable" },
  { first=0,  last=0,  attr=rw, name="SCE",   short="SC Extensions",       long="System call extensions" },
]

File format specification

Key Type Description
arch Mandatory Specifies the CPU architecture. Must be x86, arm or n/a.
device Mandatory Specifies the device name. cpu is special as registers under cpu do not require a fully qualified name (e.g, no need to type "cpu.efer 0xd01"). If conflicting names are encountered for different CPUs e.g., "x86.cpu.efer" and "arm.cpu.efer" exists, the command will be rejected and the user would be asked to supply the fully qualified name.
name Mandatory Specifies the name of the bit group.
description Optional Specifies the description of the bit group.
byteorder Mandatory Specifies the byte order. Must be little-endian or big-endian.
bitcount Mandatory Specifies the number of bits. Must be a multiple of 8. Maximum allowed is 256.
bitspans Mandatory Specifies the description of one or more bits in the bit group.
bitspans field specification: Key Type Description
first Mandatory The first bit in the span (inclusive)
last Mandatory The last bit in the span (inclusive)
attr Mandatory The attributes of the bits.
name Mandatory The name of the bits.
short Mandatory A short description of the bits.
long Mandatory A long description of the bits.
attr field specification: Key Value
rw Read-write.
ro Read-only.
wo Write-only.
rw1c Read, Write-1-to-Clear.
ros Read-only Status.
rw1cs Sticky read-only Status, Write-1-to-clear.
rsvdp Reserved and preserved.
rsvdz Reserved and zero.
mbz Reserved, must-be-zero.
mb1 Reserved, must-be-one.
ign Ignored.
und Undefined.
Teknomancer commented 4 years ago

Figure out if byteorder must be deduced from arch (when specified).

Is there a case we want to format something in big endian on the x86? For devices, arch doesn't make sense anyway? Maybe it does for highly CPU-specific devices like the APIC. Not sure.

Teknomancer commented 4 years ago

Rust apparently had some RFC for a BitSet (bit_set module) which is declared unstable and made as an external crate. May or may not be part of the core language/std anytime soon.

So I've renamed BitSet in sysprocalc to SysBitSet. It's annoying but what to do. I know 'namespaces' exists, but I'd rather not confuse the term if BitSet becomes part of the Rust language in the future.

Teknomancer commented 3 years ago

Found a better name. Use "BitGroup". It's must nicer than "SysBitSet" and there doesn't appear to be conflicting Rust crates or in std.

Teknomancer commented 2 years ago

Hmm. Is chunks really a good idea? What if an id is used in the description bits vector? i.e. all bit descriptions that belong to a chunk will have the same id.

Teknomancer commented 2 years ago

Got rid of chunks.

Teknomancer commented 2 years ago

funty should be used rather than num_traits.

Also should I bother going the whole distance with bitvec?

I honestly don't care at the moment about big-endian but extending it in the future can be a pain. Maybe if bitvec is easy enough, then use it now.

Update: As of c089df196a9c52b0ed3051fd51614e2fba768b0c updated to using bitvec and funty.

Teknomancer commented 1 year ago

I'm considering if using external files is such a good idea or not. At least initially we probably want to have some stuff built-in.

The convenience of having a single binary outweighs the need for adding new registers without recompiling since I'm likely to be the sole user of this app anyway.

Due to this, I've decided to put all the existing registers as static data in the binary itself. 3573721abf07a24bcb26141e87d9346e8db330c1 works towards that end. However, vectors in BitGroup struct are a blocker right now because it's contains a vector of BitSpan structs. In Rust are always heap allocated and can't be static (BSS) data. Need to see if I can convert these to array (slices ?) and still in the future be able to construct them at run-time if needed...

Teknomancer commented 1 year ago

I want to keep the register descriptions outside the code but I want the TOML parsed structs embedded in the binary (at compile time) i.e. for built-in data / registers, I want to:

  1. Avoid opening files while executing the binary - bad for performance.
  2. Avoid parsing TOML while executing the binary - do this at compile time instead.

With regard to the above goals - Here might be a good way of doing it. toml, lazy_static are required. The include_str built-in Rust macro comes in handy.

Data file placed in root of the crate: data/efer.toml

# EFER
arch = "x86"
device = "cpu"
name = "EFER"

In Rust code:

lazy_static! {
    let efer_toml = include_str!("data/efer.toml");
    static CPU_X86_EFER: BitGroup::<u64> = toml::from_str(efer_toml).unwrap();
}

I'm not yet sure if this will work but worth trying. Also I don't think it would be possible to determine the size of the register (64, 32-bit) prior to opening the file...

Edit: Don't use lazy_static. Consider using OnceCell instead because there's a chance it gets included in Rust's standard library itself rfcs/pull/2788 and rust/issues/74465

Teknomancer commented 1 year ago

Explore if it's possible to change Register::arch, Register::device into enums while being able to load them from TOML.