Open yakra opened 5 years ago
Thanks. I think I got it. Comparing /home/<user>/DataProcessing/siteupdate/python-teresco/logs/datacheck.log
to /home/www/tm/logs/datacheck.log
.
The first directory is relative to datacheck.sh
and the 2nd one would need an absolute hard-coded path. I don't see any issue as long as you catch an error when /home/www/tm/logs/datacheck.log
would not exist. Or add an option, e.g. -c
(compare) to datacheck.sh
so that the comparsion is not done by default, e.g. if one sets up his own server. I never enter the three lines to Bash but only copy them. It would be no additional step for me. Nothing I could forget.
Why did I go straight to fpcull when a simple DIFF will do? :P
If you're in /home/michih/DataProcessing/siteupdate/python-teresco
when you execute datacheck.sh, you can do diff /home/www/tm/logs/datacheck.log logs/datacheck.log
to output the diff to Bash.
If you want to save the diff as a file, diff /home/www/tm/logs/datacheck.log logs/datacheck.log | tee newdatacheckentries.diff
or somesuch.
The first doesn't work: No such file or directory
[michih@noreaster ~]$ diff /home/www/tm/logs/datacheck.log logs/datacheck.log
diff: logs/datacheck.log: No such file or directory
The second outputs strange stuff but not what I need - the content of the file - and Permission denied to "my own" file:
[michih@noreaster ~]$ diff /home/www/tm/logs/datacheck.log | /home/michih/DataProcessing/siteupdate/python-teresco/logs/datacheck.log
-bash: /home/michih/DataProcessing/siteupdate/python-teresco/logs/datacheck.log: Permission denied
usage: diff [-aBbdilpTtw] [-c | -e | -f | -n | -q | -u] [--ignore-case]
[--no-ignore-case] [--normal] [--strip-trailing-cr] [--tabsize]
[-I pattern] [-L label] file1 file2
diff [-aBbdilpTtw] [-I pattern] [-L label] [--ignore-case]
[--no-ignore-case] [--normal] [--strip-trailing-cr] [--tabsize]
-C number file1 file2
diff [-aBbdiltw] [-I pattern] [--ignore-case] [--no-ignore-case]
[--normal] [--strip-trailing-cr] [--tabsize] -D string file1 file2
diff [-aBbdilpTtw] [-I pattern] [-L label] [--ignore-case]
[--no-ignore-case] [--normal] [--tabsize] [--strip-trailing-cr]
-U number file1 file2
diff [-aBbdilNPprsTtw] [-c | -e | -f | -n | -q | -u] [--ignore-case]
[--no-ignore-case] [--normal] [--tabsize] [-I pattern] [-L label]
[-S name] [-X file] [-x pattern] dir1 dir2
I've deleted the first "ab" lines of datacheck.log to have a diff.
The first doesn't work: No such file or directory
My bad, my assumption above was incorrect; I see that you're in [michih@noreaster ~]
, and not /home/michih/DataProcessing/siteupdate/python-teresco
.
The second outputs strange stuff but not what I need - the content of the file - and Permission denied to "my own" file:
[michih@noreaster ~]$ diff /home/www/tm/logs/datacheck.log | /home/michih/DataProcessing/siteupdate/python-teresco/logs/datacheck.log
The problem here is the |
character -- you're in effect trying to take a diff of a single file, and pipe the output of diff
(which is itself receiving invalid input) to /home/michih/DataProcessing/siteupdate/python-teresco/logs/datacheck.log
, which is not an executable.
I guess using absolute hard-coded paths would be more foolproof:
diff /home/www/tm/logs/datacheck.log /home/michih/DataProcessing/siteupdate/python-teresco/logs/datacheck.log
or to save the diff as a file,
diff /home/www/tm/logs/datacheck.log /home/michih/DataProcessing/siteupdate/python-teresco/logs/datacheck.log | tee newdatacheckentries.diff
etc.
It works 😃
[michih@noreaster ~]$ diff /home/www/tm/logs/datacheck.log /home/michih/DataProcessing/siteupdate/python-teresco/logs/datacheck.log
1c1
< Log file created at: 2018-12-27 08:41:07.235800
---
> Log file created at: 2018-12-28 06:55:14.700131
5,9d4
< ab.ab501;AB41;AB/SK;;VISIBLE_DISTANCE;11.61
< ab.ab501;RR51;AB41;;VISIBLE_DISTANCE;15.09
< ab.ab509;AB511;AB3;;VISIBLE_DISTANCE;28.57
< ab.ab511;RR253;AB509;;VISIBLE_DISTANCE;15.54
< ab.ab579;AB40;RR64A;;VISIBLE_DISTANCE;11.94
618,621d612
< deunw.l182wes;PetHenStr;K31;L184;SHARP_ANGLE;151.71
< deunw.l364;L47_S;L225_E;L225_W;SHARP_ANGLE;146.42
< deunw.l531sie;L531;;;LABEL_SELFREF;
< deunw.l703;L703;;;LABEL_SELFREF;
I think I'll always copy these lines to Bash after every data check now. It would be great to get it automatically after the "Data check successful" output one day.
Is there a chance to add a "NEW" or "DEL" for each line to indicate whether the line is new or deleted (and I should remove it from FPs)
It would be great to get it automatically after the "Data check successful" output one day.
This would be easy to add to datacheck.sh, etc. Ping @jteresco ?
Is there a chance to add a "NEW" or "DEL" for each line to indicate whether the line is new or deleted
<
character in the diff output indicates lines in the first argument to diff, /home/www/tm/logs/datacheck.log
in the above example.>
character indicates lines in the second argument to diff, /home/michih/DataProcessing/siteupdate/python-teresco/logs/datacheck.log
in the above example.(and I should remove it from FPs)
FPs are excluded from datacheck.log, and thus won't appear in the diff.
diff /home/www/tm/logs/unmatchedfps.log /home/michih/DataProcessing/siteupdate/python-teresco/logs/unmatchedfps.log
">"/"<": Thanks, I had only ">" but I see it now 👍 New unmatched FPs diff would be great too.
No objection to automatic diffs, but we'll just have to make sure files exist before trying a diff on them so the script won't crash, say, on a first run in a given account.
My personal perspective: I'm not sure how much I'd use this, and am content to run diff manually when I want that info. I don't really use datacheck.sh; I just run siteupdate.py itself with whatever arguments I want/need at the time. I'll leave this to those who'd find it more useful, and/or know more about shell scripts than I do.
@michihdeu write:
It would be great to get it automatically after the "Data check successful" output one day.
Easy!
@jteresco wrote:
No objection to automatic diffs, but we'll just have to make sure files exist before trying a diff on them so the script won't crash, say, on a first run in a given account.
Having datacheck.sh diff $logdir/datacheck.log
should not be a problem, as it will have just been created by the script.
For those running datacheck.sh on noreaster, /home/www/tm/logs/datacheck.log
will work fine.
But for those running on another machine or home system, not so much.
A more foolproof solution, that can be added to the end of datacheck.sh:
echo -e "\nNew datacheck entries:"
diff <(curl -s http://travelmapping.net/logs/datacheck.log) $logdir/datacheck.log | grep '^>' | sed 's~^> ~~'
Note though, that this only lists the newly added entries.
@michihdeu write:
Is there a chance to add a "NEW" or "DEL" for each line to indicate whether the line is new or deleted (and I should remove it from FPs)
< deunw.l182wes;PetHenStr;K31;L184;SHARP_ANGLE;151.71 < deunw.l364;L47_S;L225_E;L225_W;SHARP_ANGLE;146.42 < deunw.l531sie;L531;;;LABEL_SELFREF; < deunw.l703;L703;;;LABEL_SELFREF;
I see the utility in this, too. List the deleted entries, to make sure that what's supposed to be deleted is deleted, and if it's ready to be removed from FPs.
IMO the most readable way to organize the old & new datachecks is all together, rather than in a line-by-line diff...
Removed datacheck entries:
deunw.l182wes;PetHenStr;K31;L184;SHARP_ANGLE;151.71
deunw.l364;L47_S;L225_E;L225_W;SHARP_ANGLE;146.42
deunw.l531sie;L531;;;LABEL_SELFREF;
New datacheck entries:
ab.ab501;AB41;AB/SK;;VISIBLE_DISTANCE;11.61
ab.ab501;RR51;AB41;;VISIBLE_DISTANCE;15.09
ab.ab509;AB511;AB3;;VISIBLE_DISTANCE;28.57
(If we don't want to bother with VISIBLE_DISTANCE, we could even add an option to filter them out, by tacking a | grep -v VISIBLE_DISTANCE
onto the end of the diff command...)
We could either curl
the canonical datacheck.log from the web twice, or wget
it once & save to a temporary file, whatever's clever.
@michihdeu write:
New unmatched FPs diff would be great too.
This can also be done similarly.
This all assumes that your individual branch of the repo that gets pulled down at the start of the process https://github.com/TravelMapping/DataProcessing/blob/6e98c786abf729bf195473d9a9a4d0171f2e8c82/siteupdate/python-teresco/datacheck.sh#L30 is up to date with the newest changes in TravelMapping:master.
If not?
Not much to say about that, other than... Best practice if you wanna run siteupdate.sh is to make sure that your personal repo is up-to-date with the latest changes from TravelMapping:master.
^ Any thoughts on including
echo -e "\nNew datacheck entries:"
diff <(curl -s http://travelmapping.net/logs/datacheck.log) $logdir/datacheck.log | grep '^>' | sed 's~^> ~~'
at the end of datacheck.sh?
go for it!
Problem is though, if your repo isn't up-to-date with changes from TravelMapping:master, you'll see other users' datacheck errors that have been fixed in TravelMapping:master
Ping @jteresco, thoughts?
Mine is always up-to-date with master because it needs to be!
I didn't update my user repository from master yesterday. I only committed my update user list file. On syncing, I got an error (debug with commands...). After updating, it worked. I have this issue quite often when I don't update although there is no reason at all - I only update michih.list which is never updated by anyone else on master.
If I had to delete my user repository fork, set it up again.... no big deal, just doing. If I had a similar issue with hwy data and I had to delete my user repository fork.... Not worth to risk!
@michihdeu, I'm having trouble following your post...
I didn't update my user repository from master yesterday. I only committed my update user list file. On syncing, I got an error (debug with commands...). After updating, it worked. I have this issue quite often when I don't update although there is no reason at all - I only update michih.list which is never updated by anyone else on master.
I'm unsure what was happening here (Did you have trouble with https://github.com/TravelMapping/UserData/commit/b68a6a297d4e321741a0cadd5b38b4487e923576, https://github.com/TravelMapping/UserData/commit/2c2595534acfb00adfc941865c1da2bffef802a7 or https://github.com/TravelMapping/UserData/commit/2bf50afffafd81832f28990cc36488b0a22408f3?), but some months back I was experiencing errors about merge conflicts for unchanged files. Upgrading my git version cured this; maybe upgrading git could help you too?
There's always potential for merge conflicts with HighwayData updates -- either when merging master into our branches, or merging our branches into master. (At least in the latter case, those who are less comfortable with git can leave the conflict resolution to Jim or one of us or whoever.)
What I personally do to avoid the risk of having to delete and create my fork of the repo is, I keep my yakra:master
branch clean, and only use it for merging in the latest changes from TravelMapping:master
. I do my work in other branches, and make my pull requests from those. This way, if another branch gets FUBAR, I'll still have yakra:master
to fall back on.
TravelMapping:master
regularly to avoid seeing other people's errors. Unless... well, see below...TravelMapping:master.
Most of this requires at least some level of comfort with git somewhere along the line...
Unless, Here's an idea...
This could help make the results more foolproof & noise-free for those (like me, actually) who don't sync their repos to master all that often.
Look for an optional myregions.cfg
or tmregions.cfg
or whatever we call it.
diff <(curl -s http://travelmapping.net/logs/datacheck.log) $logdir/datacheck.log | grep '^>' | sed 's~^> ~~' | grep -f <(cat myregions.cfg | tr -d '-' | tr '[:upper:]' '[:lower:]' | sed 's~\(.*\)~^\1.~')
I'm unsure what was happening here
Thanks for asking, I finally got it while anwering 😄
It was this pull request. I use Github Desktop on my PC and this was pressing the Update from TravelMapping/master
button. I don't know what's happening backend when pressing the button.
I've updated my user list file five days later(!) - assuming about 5 updates to master meanwhile - and committed. Then, I've pressed the Sync
button which updates my fork. I don't know if anything else happens backend.
Nevertheless, I got a sync error.
I had to press the Update from TravelMapping/master
button again and the sync worked afterwards. I could run datacheck;
cd ~/DataProcessing/siteupdate/python-teresco
git pull
sh datacheck.sh
to make sure that my user list file update was correct (no error in michih.log
).
This procedure - for user repo - did sometimes work but I usually press the Update from TravelMapping/master
button to avoid any trouble like the one described here.
What happened?
I've executed datacheck in-between. For this pull request to highway data, this and this. And the git pull
command updates my fork(!!). That mean, my Sync
would undo the changes to other user list files merged into master meanwhile.
Thanks for asking 👍 😄 😄
It's all off topic though 😆
Nope. git pull
does not update User nor Hwy data of my noreaster folder from Travelmapping/master
😞
Nope.
git pull
does not update User nor Hwy data of my noreaster folder fromTravelmapping/master
😞
I've not yet read your post above that one, but...
I nosed around your home directory on noreaster. It looks like your repos there are both clones of michihdeu:master
rather than TravelMapping:master
:
[yakra@noreaster /home/michih/UserData]$ git branch * master [yakra@noreaster /home/michih/UserData]$ cat .git/refs/heads/master ac421bcc2472c05308926e248297f37de4a6b5c6
That's the most recent commit in michihdeu:master right now. It was merged into TravelMapping:master in a more recent commit.
[yakra@noreaster /home/michih/HighwayData]$ git branch * master [yakra@noreaster /home/michih/HighwayData]$ cat .git/refs/heads/master de7f837beb532be3c00bdc26a9bc5730618e6f08
That's the most recent commit in michihdeu:master right now, which is ahead of TravelMapping:master.
You would have to merge tm:master into your own fork, then push that commit back to GitHub. However that process normally works with GitHub Desktop. (Sync
button?)
Thanks. I don't get why my master is one commit ahead. I merged in three commits today the very same way. I don't think that @jteresco must merge anything additionally?
https://github.com/michihdeu/HighwayData/ says " This branch is 2 commits ahead of TravelMapping:master", but I think that's wrong:
https://github.com/michihdeu/HighwayData/commits only shows one commit, Merge remote-tracking branch 'refs/remotes/TravelMapping/master'
So your branch is ahead by one commit, one merge commit -- but all the highway data the repo contains, the WPTs & everything else, are identical. This merge commit is not in TravelMapping:master yet, but will be when your next pull request is merged.
The one commit contains the data from panda's pull request I merged into travelmapping:master and then sync my fork..... It's not important to understand it. Don't waste your time on it....
^^^^^^ Weird. I still don't get it. I almost thought for a second it had something to do with there being a more recent commit on you branch than other commits on tm:master -- like you said,
I've updated my user list file five days later(!) - assuming about 5 updates to master meanwhile
...But this shouldn't matter, if you're not changing anyone else's files and nobody else is changing your files. Git will (should?) be able to happily merge together all changes are long as there are no edits to the same line of the same file. In fact, when pushing new edits to tm:master, your fork will always be ahead: If you merge tm:master into your branch first, there will be a merge commit, like described above.
Nevertheless, I got a sync error.
Remember anything about what the error message said?
What happened?
I've executed datacheck in-between. For this pull request to highway data, this and this. And the
git pull
command updates my fork(!!).
git pull
updates your fork/clone on noreaster. It pulls down the latest changes that are on GitHub. It won't change any of the data on GitHub itself.
That mean, my
Sync
would undo the changes to other user list files merged into master meanwhile.
Not quite, it's just that their changes are not in your repo yet -- they simply haven't been made in your fork. The changes will show up in your fork once you merge tm:master in.
So I truly don't understand what was happening to you. Not knowing anything about the error message, I'll just chalk it up to bugs in git reporting merge conflicts when there were in fact none; I've seen this behavior before.
It's not important to understand it. Don't waste your time on it....
😆
Remember anything about what the error message said?
nope. I was totally pissed off because I thought that I can risk to commit and sync without update from travelmapping.master
first. The problem is, that the button is disabled. I need to wait about 5 minutes after starting the Git application. Then, it has recognized that there is newer data and I can manually update. It just takes a few seconds. The same story with each repository. And only the repo is checked which is selected. That means, I usually need to wait 5 minutes twice....
I still use an old version of the desktop application because I failed using the "new" one when I tried it first. 2.5 years ago.....
nope. I was totally pissed off because I thought that I can risk to commit and sync without
update from travelmapping.master
first.
Generally, you should be able to, especially with UserData where different people should not be editing the same files. Again... weird.
The problem is, that the button is disabled. I need to wait about 5 minutes after starting the Git application. Then, it has recognized that there is newer data and I can manually update. It just takes a few seconds. The same story with each repository. And only the repo is checked which is selected. That means, I usually need to wait 5 minutes twice....
Yuck. That sounds like a pain.
I still use an old version of the desktop application because I failed using the "new" one when I tried it first. 2.5 years ago.....
I don't have any personal experience with GitHub desktop. I've been using GitKraken since March; before that, the GItHub web interface.
Anyway...
Adding diff <(curl -s http://travelmapping.net/logs/datacheck.log) $logdir/datacheck.log | grep '^>' | sed 's~^> ~~'
to datacheck, only to require a myregions.cfg
to make it work properly, may be more trouble than it's worth.
Running the command manually is easier than it looks. If every time I log into noreaster, I type only
cd TravelMapping/DataProcessing/siteupdate/python-teresco/
./datacheck.sh
diff <(curl -s http://travelmapping.net/logs/datacheck.log) logs/datacheck.log | grep '^>' | sed 's~^> ~~'
exit
...I can cycle back these 4 previous commands by pressing the up arrow 4 times after logging in. I won't have to retype it out, or paste in the command from this thread. Does that work well enough for your purposes?
I've been using GitKraken since March; before that, the GItHub web interface.
Is it easy to use? Self-explaining? Or a freek tool? 😉
Does that work well enough for your purposes?
The status quo is not perfect but... ok
Is it easy to use? Self-explaining? Or a freek tool? 😉
Easy to use; it has a pretty self-explanatory GUI. Some stuff is not supported though; for example I have to use the commandline if I want to move or rename a file. :( If you're already comfortable with GitHub Desktop, it's probably not worth the bother of getting familiar with a new program.
The status quo is not perfect but... ok
only to require a
myregions.cfg
to make it work properly, may be more trouble than it's worth.
I meant to expand on this more, but forgot. :( I think it might be near the edge of people's abilities, and add another layer of confusion for most contributors. First, there's how to create or get the file onto noreaster in the first place. Then, editing it if picking up regions from, or turning regions over to, another contributor. What are we gonna have people do, use emacs? One wrong keystroke and you're toast!
Rethinking this.
This idea of looking for new datacheck errors is probably the wrong approach.
Better to focus on filtering the data relevant to each contributor's regions, using a MyRegions
file as mentioned upthread.
For contributors like @michihdeu & me who always quickly fix our errors or mark them FP, any errors visible will be new ones. :)
https://forum.travelmapping.net/index.php?topic=4553.msg29376#msg29376
- Make dashboard.sh a canonical part of DataProcessing, in a location TBD.
- Add a command line switch to specify a log directory.
- If no regions are specified on the command line and no MyRegions file is found, terminate after giving instructions on how to create one.
- Add MyRegions to .gitignore, or else just instruct the script to always look for it in $HOME or something.
- Add a prompt at the end of datacheck.sh: Do you want to run the dashboard script? (Pressing Q will quit the text viewer.) Y/N:
- Pipe the output to less.
For contributors like @michihdeu & me who always quickly fix our errors or mark them FP, any errors visible will be new ones. :)
True.
in https://github.com/TravelMapping/DataProcessing/issues/57#issuecomment-449699335, @michihdeu wrote:
Yes.
fpcull can be used for this purpose* right now. It wouldn't output the new data to Bash though -- just write it to a specified file. The official datacheck.log is on noreaster at /home/www/tm/logs/datacheck.log .
While adding functionality to siteupdate.py would be easy, I'm a bit iffy on the "right" way to do it. The official datacheck.log is at the path noted above, not at
logfilepath
or anything specified via the siteupdate.py commandline. Hard-coding a path into siteupdate.py that may or may not exist on a given system seems Bad Form -- and I don't want to add a new argument just for this.Perhaps the simplest thing to do is to execute fpcull from...
$datestr/$logdir
and hasn't yet replaced the one in$tmwebbase/$logdir
*Not the specific purpose it was designed for, but it'll work. It can be used for any case where you want to remove lines of text contained in one file from another file.