Closed dpmm99 closed 6 months ago
I wrote a Trash script to compare the parse trees of the grammars before and after reformatting (minus all intertoken attributes). There are no grammar differences. So, that much is fine.
#
set -x
set -e
before=6e78e1872264ca4b78bb89243ad005db102cf3c9
after=753536777d827ccc0c9b108531ea67375c2039ac
prefix=`pwd`
git checkout $before
directories=`find . -name desc.xml | sed 's#/desc.xml##' | sort -u`
for g in $directories
do
echo $g
pushd $g > /dev/null 2>&1
g=`pwd`
g=${g##*$prefix/}
trparse -t ANTLRv4 *.g4 | trdelete ' //@*' | trtree > before.txt
popd > /dev/null 2>&1
done
git checkout $after
directories=`find . -name desc.xml | sed 's#/desc.xml##' | sort -u`
for g in $directories
do
echo $g
pushd $g > /dev/null 2>&1
g=`pwd`
g=${g##*$prefix/}
trparse -t ANTLRv4 *.g4 | trdelete ' //@*' | trtree > after.txt
popd > /dev/null 2>&1
done
for g in $directories
do
echo $g
pushd $g > /dev/null 2>&1
g=`pwd`
g=${g##*$prefix/}
diff before.txt after.txt || true
popd > /dev/null 2>&1
done
I don't have a script to check comments yet, but it looks like the reformat should not have done reflow of comments. That means I can "grep" the comments and compare what is missing after reformatting.
I had to change the Trash parse tool to create attributes named after the token type (https://github.com/kaby76/Domemtech.Trash/issues/434). Antlr4 grammars have three types of comments, so the trxgrep looks for DOC_COMMENT, BLOCK_COMMENT, and LINE_COMMENT. After grepping for comments, I removed the lines containing antlr-format as these were added by the reformatter.
#
# set -x
# set -e
before=6e78e1872264ca4b78bb89243ad005db102cf3c9
after=753536777d827ccc0c9b108531ea67375c2039ac
prefix=`pwd`
git checkout $before
directories=`find . -name desc.xml | sed 's#/desc.xml##' | sort -u`
for g in $directories
do
echo $g
pushd $g > /dev/null 2>&1
g=`pwd`
g=${g##*$prefix/}
trparse -t ANTLRv4 *.g4 | trxgrep --no-prs ' //(@DOC_COMMENT | @BLOCK_COMMENT | @LINE_COMMENT)' | grep -v antlr-format > before.txt
dos2unix before.txt
popd > /dev/null 2>&1
done
git checkout $after
directories=`find . -name desc.xml | sed 's#/desc.xml##' | sort -u`
for g in $directories
do
echo $g
pushd $g > /dev/null 2>&1
g=`pwd`
g=${g##*$prefix/}
trparse -t ANTLRv4 *.g4 | trxgrep --no-prs ' //(@DOC_COMMENT | @BLOCK_COMMENT | @LINE_COMMENT)' | grep -v antlr-format > after.txt
dos2unix after.txt
popd > /dev/null 2>&1
done
for g in $directories
do
echo $g
pushd $g > /dev/null 2>&1
g=`pwd`
g=${g##*$prefix/}
diff before.txt after.txt
if [ "$?" != "0" ]
then
echo $g has diffs.
fi
popd > /dev/null 2>&1
done
Indeed, we now see a collection of differences in comments from the formatter. These grammars will all need to be fixed.
haskell
sql/derby
sql/tsql
https://github.com/antlr/grammars-v4/blob/753536777d827ccc0c9b108531ea67375c2039ac/sql/tsql/TSqlLexer.g4#L1292
It looks like some information was lost because of the automatic formatting. For example, at the end of this file:
turned into
Someone might want to compare that commit with its parent with all whitespace stripped to find other cases of lost comments and fix the tool so it doesn't remove comments.
To begin with, automatic formatting may not have been the best idea considering there were deliberate decisions made for things like this bit from the T-SQL parse grammar: