benjann / estout

Stata module to make regression tables
http://repec.sowi.unibe.ch/stata/estout/index.html
MIT License
71 stars 18 forks source link

Display warning on filetype confusion #36

Open NilsEnevoldsen opened 2 years ago

NilsEnevoldsen commented 2 years ago

There is a class of error that I see quite often, especially among novice Stata users. Call it "faith in the file extension". The most common manifestation I see is saving (not exporting) a graph file (i.e. a .gph) with an image extension (e.g. .png). But I see a recent Statalist user making the same error with esttab and the .xls extension. (This almost works because Excel is smart.)

Commands cannot always save users from their own actions, but some of this confusion might be reduced if a warning were displayed in the following circumstance:

  1. Command is: esttab
  2. With option: using
  3. With document format option: not specified
  4. Where filename suffix is: doc, docx, pages, odt, xls, xlsx, numbers, ods, pdf.

The message would be something like this:

doc, docx, pages, odt: "Writing a fixed-format file with extension ".$extension". You may want to write an RTF file instead." xls, xlsx, numbers, ods: "Writing a fixed-format file with extension ".$extension". You may want to write a CSV or SCSV file instead" pdf: "Writing a fixed-format file with extension ".pdf". You may want to write a TEX file instead."

wbuchanan commented 2 years ago

@NilsEnevoldsen The alternatives that you are listing are not comparable file formats. The first suggestion that I would have is to drop .doc and .xls. These file formats aren't supported consistently in current versions of MS Word/Excel and the chances that anyone is working with someone that needs a file format that is almost 20 years old (at the time when it was initially discontinued, see Microsoft Office Word 97-2003 Binary File Format (.doc) for additional info) is probably fairly slim. The open office files and .docx and .xlsx are based on the same underlying XML standard and are exchangeable; I assume the same is true for the Apple-specific applications as well. Lastly, it isn't clear what specifically you mean when you say "fixed-format file" which could confuse end-users more.

More importantly, your suggestion also doesn't address the same problem when someone specifies csv or rtf file extensions but really wants a file that will open with the appropriate formatting in some type of Office related product; while RTF does contain formatting information, the same is not true for CSV.

NilsEnevoldsen commented 2 years ago

The alternatives that you are listing are not comparable file formats.

I agree. These messages may require wordsmithing to avoid that implication, but variations I considered involved a length-vs-clarity tradeoff.

The first suggestion that I would have is to drop .doc and .xls.

The purpose of the warning would be to help confused users, who may not understand the difference between .xls and .xlsx. For example, in the Statalist post that prompted me to create this issue, the user specified .xls. Since the cost of "supporting" .xls would be nearly zero additional effort, and since I can think of no other downsides to its inclusion if the rest of this proposed feature is added, I think it would make sense to include it if this feature is implemented.

Lastly, it isn't clear what specifically you mean when you say "fixed-format file" which could confuse end-users more.

I agree it's not the clearest name. I am just using the terminology that estout documentation uses.

More importantly, your suggestion also doesn't address the same problem when someone specifies csv or rtf file extensions but really wants a file that will open with the appropriate formatting in some type of Office related product; while RTF does contain formatting information, the same is not true for CSV.

I agree, but that issue is hard to fix. This issue is (I think) easy to fix. If a pull request would be accepted, I might even write it.

I do think that CSV is probably the best estout-supported format to recommend for someone who thinks they want "an Excel file", despite the fact that it contains less formatting than an RTF.

wbuchanan commented 2 years ago

I don't think there is actually a problem here. This behavior is long established in the history of estout. It seems more like trying to engineer a solution to a problem that doesn't substantively exist.

NilsEnevoldsen commented 1 year ago

Another example from the wild: https://www.statalist.org/forums/forum/general-stata-discussion/general/1728988-estout-exporting-negative-p-values

NilsEnevoldsen commented 6 months ago

Another example from the wild: https://www.statalist.org/forums/forum/general-stata-discussion/general/1753267-is-it-possible-to-do-nice-latex-excel-word-tables-with-reghdfe