DavidCox1979 / support-tools

Automatically exported from code.google.com/p/support-tools
Apache License 2.0
1 stars 0 forks source link

Archived issues have broken formatting #176

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Example: https://code.google.com/archive/p/sympy/issues/4105

You can see the correct formatting in the issue that was migrated to GitHub 
https://github.com/sympy/sympy/issues/7204.

The issue is that Google Code issues never had markup, so people would just 
paste code and tracebacks and whatever and it looked fine. But now it is trying 
to add markup to issues that never had it, making them hard if not impossible 
to read. 

Original issue reported on code.google.com by asmeurer@gmail.com on 26 Jan 2016 at 9:15

GoogleCodeExporter commented 8 years ago
Thank you for the bug report. And your assessment is spot on.

This is the content that exists in the Google Code Archive as part of the JSON 
dump. So this won't require reexporting anything, just a frontend JavaScript 
change. It looks like the conversion to Markdown isn't properly replacing \r\n 
with a <br>.

"\u0026gt;\u0026gt;\u0026gt; eye(1).free_symbols\r\nTraceback (most recent call 
last):\r\n  File \u0026quot;\u0026lt;stdin\u0026gt;\u0026quot;, line 1, in 
\u0026lt;module\u0026gt;\r\n  File 
\u0026quot;sympy\\matrices\\matrices.py\u0026quot;, line 3066, in 
__getattr__\r\n    \u0026quot;%s has no attribute %s.\u0026quot; % 
(self.__class__.__name__, attr))\r\nAttributeError: MutableDenseMatrix has no 
attribute free_symbols."

Original comment by chrsm...@google.com on 27 Jan 2016 at 3:01

GoogleCodeExporter commented 8 years ago
Also the __getattr__ shouldn't be bold. Really all markdown formatting 
shouldn't be enabled, because people wrote code in issue comments assuming it 
wasn't there. 

Original comment by asmeurer@gmail.com on 27 Jan 2016 at 4:33

GoogleCodeExporter commented 8 years ago
I just noticed that usernames are broken as well. I'm now "happy elephant" 
apparently (see https://code.google.com/archive/p/sympy/issues/4133 and 
https://github.com/sympy/sympy/issues/7232). 

Original comment by asmeurer@gmail.com on 8 Feb 2016 at 9:48

GoogleCodeExporter commented 8 years ago
I'm starting to gather up Codesite Archive frontend bugs to fix. I don't have 
an ETA, but I'm working on it.

As for user names being replaced with things like "happy elephant" that is 
actually by design.

Google Code currently shows a semi-obfuscated user name (based on a setting). 
However, many people are surprised by this and don't want their email address 
to be discoverable on the internet. So we chose to replace Google Code profiles 
with opaque user IDs in the Archived version of issues.

e.g. chrsmith maps to ID 12345. However, in order to keep users anonymous that 
ID is project-specific. So chrsmith will map to a different ID in the archived 
version of a different project.

Original comment by chrsm...@google.com on 12 Feb 2016 at 6:56

GoogleCodeExporter commented 8 years ago
But in Google Code you can extract the real email via a captcha. Is there no 
way to see the real person who wrote an issue comment? Seems like unnecessary 
information hiding for an "archive". 

Original comment by asmeurer@gmail.com on 12 Feb 2016 at 7:04

GoogleCodeExporter commented 8 years ago
You are correct. In the most general case you will not be able to get the 
user's email address from the archived data dump.

There is a tradeoff to be made here, between having an accurate snapshot of all 
of Google Code's data, and protected users email addresses from being stored on 
the internet. (i.e. to be harvested by spammers.) We opted for the latter.

We could have written the logic to keep the ability to crack open a captcha so 
project members could see the email address of an issue commenter, but this has 
a few couple major drawbacks:

- It requires we authenticate access to the Google Code Archive. Only project 
members should be able to see your email address, based on the issues you left 
on a project. So we would have to wire in Google auth/login to the Google Code 
Archive to preserve that check. (Otherwise we would be leaking data that was 
previously hidden on Google Code.)

- It requires the Google Code Archive to copy user's "Google Code Profiles", 
which includes email address, display preferences, and so on. And again, this 
has been a source of confusion for people who were surprised that filing a bug 
report would make it possible for people to see their email address.

Between the Google Code-to-GitHub exporter and Google Takeout support for 
Project Hosting, there are ways to get more accurate information from issues. 
But you are correct in seeing that a year from now, when Google Code is 
replaced by just the Archive, some data will be lost. 

In the mean time, if you need to crack open captchas for any users who have 
reported issues on your project, let me know.

Original comment by chrsm...@google.com on 12 Feb 2016 at 7:16

GoogleCodeExporter commented 8 years ago
For my specific project (sympy) it looks like the exporter script we used 
preserved author links which still work (at least for now), like 
https://code.google.com/u/103073311122698598373/. 

Original comment by asmeurer@gmail.com on 12 Feb 2016 at 7:19