Closed vsoch closed 6 years ago
And by the way, this is really cool! Given standard inputs and outputs, a lot of cool visualization tools could be built around this!
Hi @vsoch, I just wanted to stop by and say thank you for the valuable feedback and review. I just moved to a new house and am up against a couple of academic deadlines :weary:, but I intend on updating the software according to your input as soon as I can. Thanks again!
Congratulations on your move!! No worries, please take the time you need. I'm in the middle of moving too (the truck is still... who knows where...) and I can completely empathize. I'm mostly avoiding thinking about the "I need to unpack when things finally arrive" part crawls into fetal position and thinks of someday-when-I-am-not-moving
I've added a commit that has addressed all of your comments, @vsoch. Please review and let me know what you think when you have time. Thanks again for the helpful and constructive feedback! I'm happy to see DeGAUSS
improving as a result of the peer review process. I've outlined my changes below, roughly in order of your review above:
DeGAUSS/geocoder
is hosted in a separate GitHub repo. This provides separation between "geocoding" and "geomarker assessment". The matched tags are achieved through automated builds on DockerHub triggered by a new tagged release on GitHub.CB::CBapply
my_address_file.csv
with address
columnDeGAUSS
tools in separate containers because their dependencies are quite different and DeGAUSS
is designed to be extendable by the user for other geomarker datasources and assessment toolkits. The geospatial datasets used are commonly > 1 GB and keeping each geomarker in a separate container not only reduces the image size, but more importantly, keeps the entire system modular and more reproducible.CONTRIBUTING.md
Awesome! Will have some time to look later today, stay tuned!
Hey @cole-brokamp ! Apologies for the delay, my state (NC) is preparing for a storm of doom and everyone is freaking out, myself included because I evacuated my apartment this morning after the predicted storm category increased and debating for a bit. Now I'm at least somewhere with a generator, and will be assured internet connectivity! I can proceed with the review!
You have responded very well to all of my points, and I would suggest this paper for publication. I will go back to the original thread to look over the paper once more.
Thank you for the quick followup and I hope that you are staying safe!! I had one question about implementing tests. This is something that I want to do, but am struggling with the specifics. Do you think it would be wise to implement some formal unit tests? Or are you thinking more along the lines of "Here is a sample address file and if you run it through these three containers using this specific code, then here is what the output should look like at each step"? Does that make sense? I appreciate any insight you might have.
hey @cole-brokamp! You would minimally want those functions run against the data files on each commit to master, using a service like Travis CI or Circle CI. It’s something you don’t have to think about, and any user can run the equivalent tests manually. I can offer to create a simple setup for you (and issue a PR, you would need to officially connect the service from your web interface) to help you learn by just seeing what I mean. What do you think?
I see. I am familiar with tests in R, but have no experience creating test for bash commands. I don't want to create any additional work for you -- can you point me in a helpful direction and I can ask you for more help if I need it?
Sure! Here you go https://circleci.com/docs/2.0/getting-started/ what you can maximally do is set up a build test and deploy cycle where the containers will be built and deployed to Docker Hub after passing the tests. But if you prefer to just push manually this is overkill, and you can just have a simple setup that builds the containers and runs the same commands the user would. You basically are going to write a .circleci/config.yml file that will issue these commands to test! Yaml and their workflows are a bit funky so please ask me if you need help. I’ve made these so many times I can do it pretty quickly. Good luck!
thanks! I will dive in this weekend and let you know how it goes
I (finally!) got time to add tests through Travis. Each of the three images is built and a command is ran on each using a sample geocoded address file and Travis checks to make sure the expected output file is present. Thanks for your help in teaching me about this!
Fantastic work, and your tests just pass! This definitely meets all the criteria for publication, and after the fact when you feel up for it, you can give a go at adding a deploy step so in the future the docker hub builds are just magically done for you (tagged with either a software version or Github commit). I'll update the main thread to let the editors know! I'm going to close here.
Review
This is software that can calculate coordinates based on addresses, provided in a container for reproducible use. It's well documented and described. My main comments have to do with clearing up some of the examples and description. I understand the use of containers for reproducibility, but it's not clear to me how geocoding an address (that can be reversed) somehow ensures privacy. Please read my comments below for details about how it might be better described (in human friendly terms for the layperson) and thank you for the opportunity to review this work. [Accept]
Criteria
Does the paper ONLY include these things?
[x] A clear statement of need that illustrates the purpose of the software
General Review of Paper
Here are a few things I genuinely don't understand / know what they are based on not being a part of the field:
Ah, here is where you start to explain it well:
The figure is nice, but a caption that more clearly explains it would be great!
Review items
Software
Update September 10, 2018 The versions are coinciding with tags in a different repository, via an automated docker build. This is okay.
Software license
Testing the software, I get these warning messages:
Update September 10, 2018 This warning message has been suppressed.
Given the container, has the author considered following the suggestion to eliminate the warning?
Documentation
Here I am reviewing the README.md in the main repository.
Update September 10, 2018 The about section and example usage are fabulous.
One quick suggestion - the author provides a sample file with the column name
address
and then the example command is:I think it would make sense to 1) tell the user to name the file "my_address_file.csv" and then have the same column name (one or the other) so literally the user can copy paste the command (without change) from the $PWD with the file and it will work.
What isn't clear from this example is what I just did. I converted a file of addresses to... another file? Maybe describing this in the context of an actual example (e.g., "Let's say you have these addresses that represent... and you want to..." here is how you would do this.).
Update September 10, 2018 This file naming has also been updated and fixed.
References
There are a bunch of empty lines in the references (maybe authors?) that might be looked at.
Community guidelines