Open AndersonTorres opened 3 weeks ago
To introduce such a source type we need to clearly specify when it applies and when it does not and binaryBytecode
is rather to be used.
Things i believe it should apply to:
Things I believe it should not apply to:
Well, I believe binaryBytecode is precisely applied for cases like JVM, in which the code is a soup of bytes to be read by a virtual machine not corresponding to a real world computer. In this sense, strangely, binary files to be executed by MMIX from Knuth are binaryBytecode (since no one is crazy enough to implement it).
On the other hand, files from IOCCC are fromSource regardless their (lack of) readability.
Splitting this in two categories, {readable,unreadable}MachineGenerated is a good idea?
Further, there is at least one good reason to have the machineGenerated
class: the Bootstrappable Project does not like machine-generated code like Haskell-to-C.
I don't think readable
/unreadable
is a good distinction. What i hoped to illustrate is that what we consider bytecode and not is kinda arbitrary.
consider cpython bytecode:
>>> def hello(a, b): return a + b
>>> hello.__code__.co_code
b'\x97\x00|\x00|\x01z\x00\x00\x00S\x00'
>>> import dis
>>> dis.dis(hello)
1 0 RESUME 0
2 LOAD_FAST 0 (a)
4 LOAD_FAST 1 (b)
6 BINARY_OP 0 (+)
10 RETURN_VALUE
In its binary form we consider it bytecode, but in its disassembled form one might very well consider it machine generated assembly. This bytecode is designed to run in the python runtime. Minified js, or maybe even jsfuck, is designed to run in the javascript runtime. Whether the parser is recursive or just a simple switch-case lookup is really just an implementation detail of the runtime. binaryBytecode
is machine generated code that requires a runtime or vm to run, as opposed to binaryNativeCode
which is inherently platform dependent.
I don’t think there’s any real difference between binary native code and a machine-generated pile of C except that you can open one of them in a text editor. The difference with binary byte code is, I guess, that it is expected to run on a VM that may or may not have some kind of sandboxing? But I don’t really know why sourceProvenance
is so elaborate, or what use the distinction would be to people; to me it’s just from source, or not from source.
(But I agree that we need some way to represent this.)
While building Dorion from sources #265771, a minified JS code is downloaded as an input source.
It is not a code written by a human being. It is not meant to be readable by human beings (I had headaches trying to do it, believe me).
However it is not binary machine or bytecode either.
I suggest this new source type:
machineGenerated
.Well, the name can be a bit misleading, given that bytecodes are machine-generated too. I am open to suggestions!