cablej / ransomwhere

The open ransomware payment tracker
https://ransomwhe.re
MIT License
90 stars 10 forks source link

Add missing Bitcoin prices to BTC-USD.csv #9

Closed jzelenjak closed 4 months ago

jzelenjak commented 6 months ago

Description of the problem

During my work with the Ransomwhere dataset, I have noticed that the first transaction in the current version of the Ransomwhere dataset is on 2012-03-09 03:31:54, which can be verified using the following Unix pipeline (here and further we assume the latest version of the JSON exported from the www.ransomwhe.re website is stored in the data.json file in the current directory):

jq '.[].transactions.[].time' data.json | sort | awk 'NR == 1 { print(strftime("%Y-%m-%d %H:%M:%S", $1)); }'

However, the first entry in the BTC-USD.csv file, which is used to convert the amount in BTC to USD, is 2014-09-17. As a result, transactions before this date result in 0 in the amount in USD, e.g. as can be seen here (can be verified using sth like jq '.' data.json | grep -C 10 '"amountUSD": 0$'):

image

In total, there were 640 transactions before September 17, 2014, which can be verified as follows:

jq '.[].transactions.[].time' data.json | awk '$1 < 1410904809 { print $0; }' | wc -l

Because of the missing dates, the total sum in BTC on the website is also lower than it should be:

[jegor@arch data]$ jq '.[].transactions.[].amount' data.json | paste -d '+' -s | bc | awk '{ printf "%f\n", $1 / 100000000; }' # / 100000000 is to convert Satoshi to Bitcoins
82939.629233

(while the website says ₿53,179.76) image

The total sum in USD is the same as on the website, though:

[jegor@arch data]$ jq '.[].transactions.[].amountUSD' data.json | paste -d '+' -s | bc | awk '{ printf "%f\n", $1; }'
371767884.130631

Description of the pull request

This pull request adds the missing Bitcoin prices before September 17, 2014 to the BTC-USD.csv file for conversion.

First, I have exported the Bitcoin market price from https://coinmarketcap.com/currencies/bitcoin/historical-data/ (the first entries seemed to be the same as in BTC-USD.csv file) as CSV file (although semicolon was used as a delimiter) in the period from February 24, 2012 until September 16, 2014. The format required by the current BTC-USD.csv file is the following:

Date,Open,High,Low,Close,Adj Close,Volume
2014-09-17,465.864014,468.174011,452.421997,457.334015,457.334015,21056800
...

The format of the exported CSV file is the following:

timeOpen;timeClose;timeHigh;timeLow;name;open;high;low;close;volume;marketCap;timestamp
"2014-09-16T00:00:00.000Z";"2014-09-16T23:59:59.999Z";"2014-09-16T02:39:00.000Z";"2014-09-16T23:04:00.000Z";"2781";474.8619995117;475.641998291;465.1279907227;466.0570068359;16797300;6187570806.23;"2014-09-16T23:59:59.999Z"

I have used some command-line processing using awk to convert the exported CSV file (called missing_bitcoins.csv here) to the format of theBTC-USD.csv file:

awk 'NR > 1 { print $0; }' missing_bitcoins.csv | tr -d '"' | awk -F';' '{ printf("%s,%f,%f,%f,%f,%f,%d\n", $2, $6, $7, $8, $9, $9, $10); }' | sed 's/\(....-..-..\)T23:59:59.999Z,/\1,/' | sort

The format is the same as required (note, I have noticed that "Adj Close" and "Close" are the same in BTC-USD.csv, which can be verified with awk -F, '$5 != $6 { print $0; }' BTC-USD.csv, which prints only the header):

Now, the dates in the BTC-USD.csv file are from February 24, 2012:

2012-02-24,4.669715,6.250507,4.669715,5.000865,5.000865,89871
2012-02-25,4.993149,5.221826,4.238415,5.085299,5.085299,85970
...
jzelenjak commented 6 months ago

I have also noticed that "Volume" column has 0s for a large part of 2013. I am not sure if this is a problem, but I have not seen this column being used anywhere

cablej commented 4 months ago

Thank you!