Closed JulienPalard closed 2 years ago
Thanks! I'm not sure I understand: what was the problem and how does this PR fixes it?
I see the nice additions, but I'd like to better understand your first message :)
I first ran my certificate-watcher, and from the result file (named errors
) I did:
grep 'Name or service not known' errors | # Find those not resolving according to certificate-watcher
cut -d: -f1 | # Get just the domain name
while read -r line
do
if [ -z "$(dig A "$line" +short)" ] # Ensure it does **not** resolve an IPv4
then
if [ -n "$(dig A "www.$line" +short)" ] # Ensure it **does** resolve an IPv4 when www. is added
then
sed -i "s/^$line$/www.$line/g" *.txt sources/*.txt # Fix it
fi
fi
done
After running that, I ran scripts/sort.py *.txt sources/*.txt
.
So there's no addition:
7 files changed, 340 insertions(+), 340 deletions(-)
and:
diff domaines-organismes-publics.txt <(cat sources/*.txt | sort)
is still true.
Merci pour tous ces éléments. Je ne serai pas disponible avant deux jours pour les regarder, mais je le ferai.
Sorry, switching back to english. I turned on my brain and finally grokked what's going on here, thanks a lot for this, I'm rebasing/merging now.
As we want only domains giving a 200 over HTTP, it's better if they resolve.
Browsers are cool nowadays: if a domain don't resolve an address they try prefixing a www automatically, so from a user point of view they do return a 200 though. But I think if we want to script some tools from this dataset, this is better that way.