edgi-govdata-archiving / web-monitoring

Documentation and project-wide issues for the Website Monitoring project (a.k.a. "Scanner")
Creative Commons Attribution Share Alike 4.0 International
105 stars 17 forks source link

GSoC Report for Week 4 Phase 1 #54

Closed janakrajchadha closed 7 years ago

janakrajchadha commented 7 years ago

https://hackmd.io/s/BkDjR4cXZ @suchthis @mhucka @danielballan @dcwalk Here is my report for this week. Please review :)

dcwalk commented 7 years ago

Just pinging @suchthis!

Would it be possible to edit your post to copy in the text here? I know sometimes linked out docs get lost (or in my case I forget someone is linking to them and I delete 'em 😊) and I'd love to keep a record of all your progress!

suchthis commented 7 years ago

Ah, yes, +1 to @dcwalk's comment. Please do copy/paste the body of your report right here in the issue so we can read here. It looks like Dawn added the GSoC labels to your issue; are you unable to do? We'll look into permissions.

Otherwise, status report looks great and I appreciate your commitment to making good progress!

suchthis commented 7 years ago

@janakrajchadha you should now be able to add labels and assign reviewers to issues in this repo.

mhucka commented 7 years ago

I read the report and IMHO it is good! Thanks for writing it.

And I also agree with @dcwalk and @suchthis about copying the report into the issue text itself.

janakrajchadha commented 7 years ago

I've shared an editable link in the Slack chat. Do you want me to copy the entire text as it is?

janakrajchadha commented 7 years ago

Do we need an assignee on this issue?

mhucka commented 7 years ago

@janakrajchadha I think there's a misunderstanding, so let me try again. The point of putting the text of the weekly report into the issue itself is that it makes the issue self-contained. If the issue links to an external file, there's a danger that at some time in the future, the content at the link will have changed or disappeared altogether. To guard against that, for the final version of a weekly report, copy the text of the report into the github issue – don't simply put a link to an external document. (You can edit the text anywhere you like, and just copy the final text into the github issue.)

Does that make more sense? We don't need an editable version, or a link to an editable hackmd document.

(Maybe it was confusing we created a hackmd template in the first place. The template was in hackmd only as a convenience, not an indication that reports necessarily had to be written there. We would have set up a template for github issues if we could, but unfortunately, github issue templates are global to a repository and doing so would have created a problem for non-gsoc issues.)

janakrajchadha commented 7 years ago

@mhucka Thanks for the clarification. The idea behind making an issue self-contained is something I hadn't thought of earlier. I'll copy the text and post it here. I used hackmd because I find it really helpful to create documents which are clean and crisp.

On a lighter note, we can consider storing a version of the hackmd report using one of our tools. 😄

janakrajchadha commented 7 years ago

GSoC Report for Week 4 of Phase 1

I'm excited to have made progress on understanding the diff response of PageFreezer API and adding missing functionality to the PageFreezer version code. My main focus this week was understanding the different sources of data and devising an initial plan around the creation and storage of a dataset. This will include a pre-filter which will eliminate trivial changes based on defined rules. The action plan was discussed in a call and the initial steps of the action plan have been decided.

I worked on the following Issues:

I submitted the following PR's:

In addition, I reviewed ENH: Minimal plugable diffing server.. I also took a look at @patcon 's chatbot and have discussed the possibility of adding NLP functionality to it after GSoC.

Finally, I'm concerned about the small lag in progress. I plan to invest more time in the coming weeks to make up for it and believe that the GitHub issues and milestones will help me keep track of where I am. It would be really helpful if all GSoC mentors could give me quick feedback if and when they notice a lag in work or any other issue.

It's been wonderful working with EDGI so far and I'm learning a lot from this amazing community on top of the lessons from the project work.

patcon commented 7 years ago

Chiming in, just in case the "paste into issue" informs more general best-practice norms:

re: HackMD vs issues for some purposes. Unlike an issue comment, HackMD is editable by any user, so clean-up and corrections don't need to bottleneck on the OP. This can sometimes be more agile, cutting down on the back-and-forth that can make issues blow up.

(Not suggesting anything change here!)

mhucka commented 7 years ago

@patcon It's a fair point, though I wonder how much editing would likely happen on weekly reports? But in general, it's true that this is a tradeoff that should be considered.

janakrajchadha commented 7 years ago

@suchthis Can we close this issue?