blockchain-etl / ethereum-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
https://t.me/BlockchainETL
MIT License
2.91k stars 830 forks source link

Error running exchange_with_ipc.py after generating blocks_rpc.json #2

Closed jonjet closed 6 years ago

jonjet commented 6 years ago

After generating blocks_rpc.json, the following commmand exchange_with_ipc.py resulted in error.

python exchange_with_ipc.py --ipc-path=~/.local/share/io.parity.ethereum/jsonrpc.ipc --input=blocks_rpc.json --output=blocks_rpc_output.json Traceback (most recent call last): File "exchange_with_ipc.py", line 20, in response = socket_exchange(args.ipc_path, ''.join(line_batch), args.ipc_timeout) File "/home/user/ethereum-etl/ethereumetl/socket_utils.py", line 13, in socket_exchange sock.connect(socket_path) FileNotFoundError: [Errno 2] No such file or directory

medvedev1088 commented 6 years ago

@jonjet Thank you for reporting this issue.

Try using --ipc-path=$HOME/.local/share/io.parity.ethereum/jsonrpc.ipc - looks like the ~ is not expanded by the shell as it's in the middle of the string. I will update README also.

However, parity is not supported by this tool at this point. The reason is geth and parity ipc work in a slightly different way as I found:

The way it's handled in web3py, for example, is it tries to parse json after every recv and if a JSONDecodeError happened this means that the response is not full yet and it will continue to wait for the remainder of the response https://github.com/ethereum/web3.py/blob/master/web3/providers/ipc.py#L175 Parsing JSON adds a lot of overhead especially if the response is big and in many parts, that's why I don't use web3py and wrote custom socket handling.

I'll probably migrate to web3py once they support batch requests https://github.com/ethereum/web3.py/issues/832, with a custom IPCProvider. One way to optimize it would be to check if the last bytes received from the socket encode a valid json terminating characters such as },],e,l and only if that's the case try to parse json https://github.com/ethereum/web3.py/issues/842 https://github.com/ethereum/web3.py/pull/849

Related: https://github.com/paritytech/parity/issues/4647 https://stackoverflow.com/questions/5034444/can-json-start-with

I have a few TODOs for this project:

  1. Unit tests
  2. Send batch requests http://www.jsonrpc.org/specification#batch.
  3. Support Parity
  4. Add HTTPProvider
  5. Error handling and logging

Any help would be appreciated.

jonjet commented 6 years ago

Just came across your article on Medium yesterday and wanted to say awesome work with the tool! I've been attempting to solve this problem of acquiring these data off the Ethereum blockchain for analytics since I've experimented with your ethereum-scraper a month back and the performance of this version far exceeds the previous tool!

I've re-attempted this using a geth and it works like a charm. So far everything works as intended.

I have a tech background, though I'm not a coder, so I think I'd be able to help with unit tests if you can describe what needs testing.

medvedev1088 commented 6 years ago

Thank you for the help!

I haven't done much programming in Python myself, most of my experience is Java/Kotlin. For unit testing I think we can use https://docs.pytest.org. It's the same tool that web3.py uses https://github.com/ethereum/web3.py. I suspect they already apply all the best practices so we could borrow configs, structure, workflows from there.

A good start can be ethereumetl.service.EthErc20Processor.filter_transfer_from_receipt_log. The tests should give it test logs and check that the result is correct.

Btw, I have a business idea: SaaS solution for analysts and developers - "SQL for Blockchain".

If you know any businessmen, project managers, developers, including yourself, who might be interested to join, or if you have any ideas please let me know evge.medvedev@gmail.com.

medvedev1088 commented 6 years ago

@jonjet I added support for Parity. Please help test when you have time.

I also changed ERC20 export output format: erc20_token,erc20_from,erc20_to columns will be 20 byte addresses with all lower-case characters.