Closed mikelambert closed 6 years ago
Actually, my prioritization logic appears flawed:
...
"BTG",2017-10-20,0.819804,1.2,0.80772,1.19,80,0,"Bitcoin Gold",11
"BTG",2017-10-21,0.873455,1.25,0.862919,0.991196,59,0,"Bitcoin Gold",11
"BTG",2017-10-22,1.01,2.09,0.844422,1.7,1756,0,"Bitcoin Gold",11
"BTG",2017-10-23,479.82,539.72,479.82,500.13,7652060,0,"Bitcoin Gold",11
"BTG",2017-10-23,1.7,13.43,1.11,7.04,41557,0,"Bitcoin Gold",11
...
"BTG",2017-11-23,241.97,299.89,241.97,293.61,154038000,0,"Bitcoin Gold",11
"BTG",2017-11-23,5.84,6.45,5.72,5.9,7800,345650,"Bitcoin Gold",11
"BTG",2017-11-24,295.75,413.74,284.26,394.22,537472000,0,"Bitcoin Gold",11
"BTG",2017-11-24,5.89,7.96,4.1,5.42,10850,348941,"Bitcoin Gold",11
"BTG",2017-11-25,394.04,394.04,339.1,356.04,208662000,0,"Bitcoin Gold",11
"BTG",2017-11-25,5.4,6.64,4.68,5.68,7480,320152,"Bitcoin Gold",11
"BTG",2017-11-26,355.72,366.79,334.74,366.79,141228000,5930460000,"Bitcoin Gold",11
"BTG",2017-11-26,5.68,6.39,4.36,5.31,3204,336402,"Bitcoin Gold",11
"BTG",2017-11-27,370.18,387.88,353.67,359.25,129160000,6172140000,"Bitcoin Gold",11
"BTG",2017-11-27,5.31,5.5,4.39,5.43,8423,314816,"Bitcoin Gold",11
...
So:
market
value, corresponding to coinmarketcap showing a non-zero marketcap. Which is great, since I assume that's when the coins came into existence, and the real market started. Unfortunately, there still two datapoints, both with non-zero market
values, making it difficult to distinguish which one I should be using. I assume the "largest" one makes the most sense.Am I parsing this data wrong, and I should know a better way to deal with these duplicate timeseries, or is there some extraneous data creeping in here? Thanks!
Ooooh, sorry, I figured out that this is due to coins on coinmarketcap that share a ticker. PRO, BTG, ACC, etc.
Not sure of a correct way to distinguish them in the dataset...especially since name
column appears to choose an arbitrary coin instead of naming both coins. For example, there are only datapoints for Bitcoin Gold (instead of Bitgem), Propy (instead of ProChain), etc.
Hi Mike, you're spot on the issue is due to several tokens sharing the same symbol. I didn't know how to go about resolving it, but then figured i'd use the slug i'm using to generate the urls for scraping, and then use that as a unique identifier instead.
The change I just committed should resolve the duplication issues and also theres a couple extra features included.
Let me know how you go, thanks
> head(pro)
slug symbol name date ranknow open high low close volume market close_ratio spread
1 propy PRO Propy 2017-09-19 295 0.823919 0.858425 0.628423 0.745318 26854 11582000 0.5082 0.23
2 propy PRO Propy 2017-09-20 295 0.744813 0.933790 0.644857 0.862584 102433 10470000 0.7536 0.29
3 propy PRO Propy 2017-09-21 295 0.859565 0.982731 0.743939 0.809898 74579 12083100 0.2762 0.24
4 propy PRO Propy 2017-09-22 295 0.773040 0.792471 0.588509 0.658002 136747 10866800 0.3407 0.20
5 propy PRO Propy 2017-09-23 295 0.657034 1.470000 0.559158 0.724104 298708 9236070 0.1811 0.91
6 propy PRO Propy 2017-09-24 295 0.731472 0.734890 0.571775 0.615710 204870 10282500 0.2693 0.16
> tail(pro)
slug symbol name date ranknow open high low close volume market close_ratio spread
122 prochain PRO ProChain 2017-12-28 1096 0.360183 0.365566 0.331167 0.354053 626739 0 0.6653 0.03
123 prochain PRO ProChain 2017-12-29 1096 0.352030 0.401564 0.345843 0.357103 523329 0 0.2021 0.06
124 prochain PRO ProChain 2017-12-30 1096 0.355327 0.358116 0.307769 0.326998 661599 0 0.3819 0.05
125 prochain PRO ProChain 2017-12-31 1096 0.328685 0.378672 0.324813 0.358117 888244 0 0.6184 0.05
126 prochain PRO ProChain 2018-01-01 1096 0.358136 0.358136 0.331516 0.345254 1424280 0 0.5161 0.03
127 prochain PRO ProChain 2018-01-02 1096 0.345629 0.446480 0.345629 0.417606 4645990 0 0.7137 0.10
Awesome, the slug works great, thank you!
Thanks for doing this work, greatly appreciated.
Was going to do some analysis of it on my own, and got confused by the presence of duplicates on some coins.
For example, up until 12-12, we have one datapoint per date, whereas after that shows two datapoints per date:
Only one has a non-empty
market
value...so I'm going to go with that. (I assumemarket
refers to market-cap? I thought at first it might be showing data from two different market exchanges or something.)