decred / dcrdata

Decred block explorer, with packages and apps for data collection and storage. Written in Go.
https://dcrdata.decred.org/
ISC License
129 stars 128 forks source link

Decode OP_RETURN data on tx page #635

Open lealife opened 6 years ago

lealife commented 6 years ago

I have created a tx and add a OP_RETURN output at https://testnet.dcrdata.org/tx/92e6e6e7d2877e105435787fd73ec572cc85e3e620332cbfd65478c6c20aa0e2

OP_RETURN 48656c6c6f2c204465637265642e

Decode the Hex data 48656c6c6f2c204465637265642e to string (UTF-8) is Hello, Decred.

Maybe the decoded data can be showed on this page

reference: https://www.blocktrail.com/tBCC/tx/3bd425901bb4ddf2684e1bb85a5b65714a5835d78addf7b01c3aa674bacd8a4c

chappjc commented 6 years ago

Considerations:

dmigwi commented 6 years ago

Just so I understand this task well, I am supposed to implement a way that the op_return text can be decoded for the user on the frontend? Similar to this BTC implementation... Right?

screen shot 2018-09-11 at 04 21 41
dmigwi commented 6 years ago

Can someone help me narrow down all the possible text encoding I should look into, So far I've got hex but it seems like other encodings are being used too.

chappjc commented 6 years ago

That is why this is a bit of a research project. There's UTF-8, ASCII, ANSI, etc.

To start, my suggestion is to look into how sites like https://cryptograffiti.info/ approach this (source: https://github.com/1Hyena/cryptograffiti).

Of course, the OPRETURN data can be anything_, so there's no way to know if you're decoding it right or if there is even a proper decoding. My suggestion is to look into golang libraries that aim to solve this problem for us. I feel strongly that this is not a problem we should attempt to solve. Surely there are libraries that scan binary files for recognizable data, which may be text or binary files like images and archives.

chappjc commented 6 years ago

I'm thinking along the lines of:

Then there are the considerations of doing everything efficiently (memory and CPU).

chappjc commented 6 years ago

Also, I suggest approaching this task in pieces. Perhaps look at the content detection / decoding issue first, then worry about the front end stuff later.

dmigwi commented 6 years ago

Taking into consideration that OP_RETURN does not allow huge data sizes to be stored, I am giving top priority to on the task to decode all its data into sensible UTF-8 text, I will look into file decoding next.

lealife commented 5 years ago

I think there is no need to detect the text encodings. DCRData should define a encoding (UTF-8) as the standard encoding, and all the users must follow the standard. Otherwise it will be so complicated and mess!! It should be a simple question.

chappjc commented 5 years ago

Well, how should we enforce this rule? OP_RETURN data can be anything at all. Could be jpeg data. This entire issue is purely amusement.

patrickdugan commented 4 years ago

Hey everyone,

My two cents: make it a byte limit of 260 bytes or something like that. If someone can make a multi-input/output or multisig with 500 bytes, a simpler tx in terms of outputs with a 260 byte op_code is ok (making the whole tx about ~500 bytes).

Litecoin has 40 bytes, a holdover from 2014. We made TradeLayer tx codes fit into that. Bitcoin moved to 80 bytes. Might as well be a little bit experimental for other apps that don't have the same rigor applied, idk. I like that we had to limbo in terms of design, but some things are counter-productive. For example, we add outputs for "reference addresses" that are then the destination for whatever is encoded in the OP_Return payload. It doesn't have to encode the address itself. The output itself adds about 30 bytes or so, so comparable to jamming the text into a big OP_Return payload. However long-term the savings are net-negative as each output must be redeemed with a ~120 byte signature. Segwit helps there. These little savings trade-offs may not matter all that much in the long-run. Let people try bigger OP_Codes why not.

ukane-philemon commented 1 year ago

@chappjc, is this still required?

chappjc commented 1 year ago

I don't care, personally. But it has never been completed. It was started in https://github.com/decred/dcrdata/pull/700 with a more open-ended decoding, which would probably end up being a DoS vector, but abandoned. Then in https://github.com/decred/dcrdata/pull/934 with an ultra-simplistic approach that treats the data as utf-8 bytes.

You're free to tackle this, but:

Just see what happens with a dumb utf8 interpretation of the nulldata push.