Open snorreralund opened 5 years ago
If we connect o several different data sources using the connector (with different call id), should we then present each data source in it's own graph? And what about connections we only use once? I.e. once we connect to a website, 200 times we connect to a financial API.
Hi Snorre, There is a lot of confusion regarding the log. Do you want us to plot all of the mentioned figures above and hand them in either in the paper or in the appendix? If so, we need to run all of our code again, which means we will extract some newer articles than we use in our analysis? Is this a problem or should we just keep our analysis as it is but make the figures based on the new log?
The objective of the analysis of the look is to document data quality. This means being transparent about your data collection. Analytically you look for signs of potentially systematic missing data (certain error codes being systematically distributed in part of the scrape, holes in the time series indicating an error in the scraping program), and artifacts (suspiciously similar response sizes or suspiciously short responses).
Analyze systematic connection errors / error codes and systematically missing data.
Look for artifacts, and potential signs of different html formatting. Systematically different formatting of the HTML will probably force you to design two or more separate parsing procedures.
If any problems are present, you get the chance to demonstrate your serious attitude towards methodological issues. You should sample anomolies (i.e. breaks in the time series, samples suspiciously small response lengths or too similar (i.e. standard empty response)) and inspect them manually to find the explanation (report this). If a real issue - think about potential consequences (if any) to your analysis - and you should now comment on potential causes and explanations, thereby demonstrating strong methodological scraping skills.