codebybrett / rebol-source-scripting

Scripts for processing the source files of Rebol.
Apache License 2.0
2 stars 0 forks source link

C Source Header Conversion #1

Open hostilefork opened 8 years ago

hostilefork commented 8 years ago

If you get an interest in a next task for source conversion (and if you don't already have some things you want to do planned first)... the big one I'd suggest is to begin coming up with a new header format for files.

I'd suggest going with an identical pattern, for consistency and also because it's probably easiest:

//
//  Rebol/C [
//      Title: {...}
//      File: %...
//      Function-Scan: yes
//      OS: [linux windows]
//      ...
//  ]
//
// Single spaced expository notes here so they don't have to wind up indented
// all funny.
//

I give "Function-Scan" and "OS" as a hypothetical idea of the kind of attribute we'd be wanting to track that is currently tracked in a table. Take a look at how + is used here:

https://github.com/metaeducation/ren-c/blob/332bd46689e9ad690989312d751157387b676829/src/tools/file-base.r#L158

It may or may not be a good example of whether it should be in the source file or in a separate table. Whether that specific one belongs in the files or out-of-band isn't so much the point, as the idea that it be easy to switch between them if need be.

So some way of getting a database scanned up of all the source file headers and query them during the make...like the database about the functions...would be the idea.

This conversion would be a chance to update the copyrights to read something roughly more along the lines of:

(c) 2007-2016 REBOL Technologies
(c) 2012-2016 Rebol Open Source Contributors
Portions (c) Saphirion AG, Atronix Engineering

All sources released under the Apache 2.0 License
see LICENSE and CREDITS.md for more information

REBOL is a trademark of REBOL Technologies

Can try and figure out what the best wording on that is...

Anyway, just something I think would be nice to happen before doing a big review of all the files and explaining what each one is there for. There are some comments that need to be brought up to date, and the intro at the top of the file is important to have...

codebybrett commented 8 years ago

I've added this to my to do list, but have not got the brain space to start it right now.

hostilefork commented 8 years ago

No problem. Though here is an example format I am playing with, by no means final:

https://github.com/metaeducation/ren-c/blob/master/src/include/sys-series.h

Horizontal dividers are fine for "sections" I just don't like them on every single function. I also don't like having to do multiple spaces from the // to type exposition, and want to get to the right margin.

Perhaps a background brain process can turn it over while you don't realize it is...

codebybrett commented 8 years ago

Legal reference:

I found Managing copyright information within a free software project to be highly educational.

Copyright notices

There are about 9 files where RT's copyright notice is not present, they were created by Atronix or Saphiron. So it seems reasonable to use the first copyright statement by creator (first copyright statement line in original file) and add a second being a contributors statement that refers to Credits.md - the second being justified on the assumption that someone did in fact modify the original in a material way.

There is one instance where Saphiron used a oddly worded copyright notice. I assume it can be removed since it should be covered by the contributors copyright statement assuming Saphrion is listed as one of those contributors in Credits.md (perhaps including a reference to the file). We'll need to create a Credits.md.

I'm not sure that text like "Portions Copyright 2016 XXX" have much useful effect if they do not identify the portions in question.

Licence notices

The above article discusses the pros and cons of an option to centralise licence notices. There is precedent in the .R files (perhaps they are too lean). I'm unsure about it. My bias is to leave the licence notice in the files rather than take the centralised approach. Boilerplate to centralise the notice may not be received well and will not save a lot of lines anyway.

Meta data

I'm assuming the existing meta data has not been parsed out and used in a programmatic way before this project.

Analysing the meta fields of the existing files shows that the most popular meta field keys in order of popularity to be:

Author  Summary  Module  Notes  Section  Title  Purpose

There are 11 other keys used - generally appearing in a single file only. Most of these could probably be considered to be part of notes.

Perhaps:

The existing meta data format key: value seems to work pretty well even where value runs over multiple lines. It has a simplicity of editing and reading.

Putting notes in the section following meta data makes editing easier as you have in your example.

Draft

Here's some draft examples:

For this draft I used a different section line format than your example, for interest.

hostilefork commented 8 years ago

For this draft I used a different section line format than your example, for interest.

There's something to be said for a lighter touch with horizontal breaks, or lack thereof. Certainly that's the way the function prototypes went... and I'm glad for it there.

But I'm trying to reserve the "heavy" lines for things like section dividers in the files, and there they come in handy. The issue with the heading is that if it's lighter than the section headers, then it feels unbalanced. It's like having a page heading in a smaller font than lower headings, essentially.

The one thing though that I do want to preserve is not having the two space indents for exposition. Again, it's okay for the code--and ideal for it since it has tab points. But something about having a split where some comment text is double spaced in and some isn't is just off.

I found Managing copyright information within a free software project to be highly educational.

Good you're doing the legwork on that, though we do have access to both Saphirion and Atronix to just ask what they'd be okay with. So I'd worry more about making it as good as possible, then asking them if it's okay, than trying to second guess what they'd be okay with or not a priori.

I'm not sure that text like "Portions Copyright 2016 XXX" have much useful effect if they do not identify the portions in question.

The "portions" doesn't make much sense, no. I think the easiest thing is just to have some chronological drops somewhere of files at various states of development if anyone actually cares about the history...

My bias is to leave the licence notice in the files rather than take the centralised approach.

I don't think the Apache license text inclusion is excessive...it's fine to have it in.

We'll need to create a Credits.md.

Yup, guess that's a matter of digging up all the crediting out there so far and see what to make of it.

I'm assuming the existing meta data has not been parsed out and used in a programmatic way before this project.

Almost certainly has not!

Generally I'm imagining that the metadata is mostly worthless, though good you're doing a survey to find that out. :-) What I'm more interested in is how the metadata can help in terms of informing the build process going forward...anything that helps reduce the number of files you have to touch to adapt the build.

Looks great so far...perhaps the 2016 copyright changeover isn't too far away...

codebybrett commented 8 years ago

An update..

I'm trying to reserve the "heavy" lines for things like section dividers in the files

I have implemented your desired heavy/decorative section format.

Good you're doing the legwork on that, though we do have access to both Saphirion and Atronix to just ask what they'd be okay with. So I'd worry more about making it as good as possible, then asking them if it's okay, than trying to second guess what they'd be okay with or not a priori.

Hmmmmm...... :-/

I'm imagining that the metadata is mostly worthless

Changes made to metadata:

replace meta 'Title 'Summary
replace meta 'Module 'File
replace meta 'Note 'Caution

move-key-to-notes 'Compile-note
move-key-to-notes 'Flags
move-key-to-notes 'Usage
move-key-to-notes 'Design-comments
move-key-to-notes 'Warning
move-key-to-notes 'Description
move-key-to-notes 'See
move-key-to-notes 'Purpose
move-key-to-notes 'Special-note
move-key-to-notes 'Caution

What I'm more interested in is how the metadata can help in terms of informing the build process going forward

Next to look at.

hostilefork commented 8 years ago

Hadn't replied because there were no questions here, but... now I will just to check in and mention a couple thoughts -

I have implemented your desired heavy/decorative section format.

Great! It may not be... perfect. But we can put it out for review with earl and Shixin and everyone.

The metadata is good, and I don't want you to work too hard on this one :-) because there are a lot of more important and probably more interesting things to do. With the headers converted all we need is some administrative sign off on the copyrights/credits.md and to figure that out.

In fact, on this one it would be super helpful you could try and put together a CREDITS.md from the various file headers and info you've scraped. I've made credits lists before and I think the more important use is where the components came from, vs. squabbling over copyrights and whatnot. e.g.

https://github.com/hostilefork/blackhighlighter/blob/master/CREDITS.md

So I was thinking the file would serve both purposes: code and persons/organizations or whoever felt like being attributed.

What I'm more interested in is how the metadata can help in terms of informing the build process going forward

Next to look at.

The biggie there--and you've been in it enough to know--is how might Rebol's own build scripts be a showcase, or at least not a liability. Why so much code? Why so much imperative?

It's easy to let something that's not a priority for you stay bad for a long time. I started %r2r3-future.r just so there would be something... once I realized that legacy had to go both ways even if one were willing to throw out history, because of bootstrap. But the file was just plopping down "something" each time bootstrap broke and "I'll get to that someday":

https://github.com/metaeducation/ren-c/blob/c78d5a2cf044a4a43c0b506b0c777ee93c41171c/src/tools/r2r3-future.r

And then I decided someday would be today, and it took me almost no time to get it to:

https://github.com/metaeducation/ren-c/blob/master/src/tools/r2r3-future.r

Which was--as I'm sure you have experienced--one of those little feel-good moments of "why this is an interesting idea for how a language might work". You try to imagine that kind of thing happening in another language and just can't see it. This is why all the other work and toil would be worth it, to get that nailed and let others come in and "get it". Even with all the pain and breakage, that's the interesting bit.

(Sidenote, taking a peek to make a remark or two, the Rebol files in tools etc. have outdated headers and should probably be sync'd to match, but then that opens up a big issue about what the ideal header format is for those. But they should match the copyrights etc.)

Anyway, I'm sure you know what I mean, but that's sort of the gripe or the broad stroke question. Why do the bootstrap files so often suck, and what can we do. The more the C source files being talked about themselves can help with the reduction of suck, the better. :-)

codebybrett commented 8 years ago

Here's a first draft credits file based on my limited of understanding of the area.

codebybrett commented 8 years ago

I should add that it is a condition of the jpeg license that the readme be distributed with the source. This apparently hasn't been done (although I haven't looked hard), probably because when the original source was included it was an exe only distribution which falls under a different licence condition.

hostilefork commented 8 years ago

Here's a first draft credits file based on my limited of understanding of the area.

That errs on the side of not mentioning people like us at all, who are doing all the work these days...

Sigh. It's hard looking at some of the files and thinking REBOL Technologies is getting way too much billing on much of the important code at this point. :-) But I think resisting the urge to react to that with putting more names or credited organizations and focusing on the Rebol Open Source Contributors is the better path.

Perhaps one of the best ways to balance credit is to let people supply a web link of their choice in the CREDITS.md--and to extend this offer to pretty much anyone who has contributed. Then everyone gets a line to summarize what they've done and then a link to write as much as they want after that? That seems a good balance of not overwhelming things.

Maybe you could be the first to supply your line and a demo link, and then I could do so at http://rebol.hostilefork.com or something, and we could put those forth as examples that Ladislav and Andreas and everyone else could follow. They could opt to make a custom landing page to talk about their Rebol involvement or just say the line and link their main page.

I should add that it is a condition of the jpeg license that the readme be distributed with the source. This apparently hasn't been done

Good to check up on these things. I was hoping to do a refresh of all the libraries, and have a little extractor like what I did for zlib. (Which reminds me, with https built in these things should be able to work over the network. Also I've been thinking ZIP format should be supported by the Rebol executable... unzip.reb is 14k but could probably be smaller. I'd file "getting that binary parse and API in tip-top-shape" in the "probably more fun than module headers" category :-P...)

codebybrett commented 8 years ago

Great idea to have a simple list with links to more information.

I've put up some conversion notes including some draft proposals in the readme for this scripting project.

Linked to that is my response to your credits idea. What I've done is create a syntax where the first line is copyright owner that contributes their copyright material, second line or subsequent as the identification of the writer. Companies then get a single opportunity for their own links. The semicolon separates the name from a comment thereby retaining the opportunity to parse the list for reformatting or whatever.

Github markdown doesn't respect line breaking so I used a bullet point. It is a bit awkward I guess, but then each point stands out as an individual contributor identity.

Thoughts?

codebybrett commented 8 years ago

I now have a new branch with an upgraded proto-parser to parse the metadata out of file headers that follow the proposed file header format. The metadata (second section of the header) is just key value fields where values are strings. The strings can be converted depending on the key as needed.

>> do %common-parsers.r
>> proto-parser/emit-fileheader: function [][meta: proto-parser/data ?? meta]
>> proto-parser/process read/string %../include/mem-series.h
meta: [
    Summary {Low level memory-oriented access routines for series}
    File "%mem-series.h"
]
== {//
// Rebol 3 Language Interpreter and Run-time Environment
// "Ren-C" branch @ https://github.com/metaeducation/ren-c
......

It is ready for a PR, but there will not be a lot of value from it until after the header conversion.

Btw I have some stuff coming up which will generally put this work on hold towards the next week for some weeks.

hostilefork commented 8 years ago

It is ready for a PR, but there will not be a lot of value from it until after the header conversion.

Great!

Btw I have some stuff coming up which will generally put this work on hold towards the next week for some weeks.

That's too bad because there will be some new features every couple days, I hope, to be looking at!

But if there's a chance of there being some technical glitch it would be best to do the conversion when you're available, and after everyone's on board with the credit format. I can merge this and go along with it as a test in the meantime if that helps...

hostilefork commented 8 years ago

Okay, so Robert said it was cool with him if we had a unified credits list. That's good. I suggested we put the draft CREDITS.md up as a wiki and let people edit in what they want it to say and get it hammered out before we put it in as the README.

Plan of action I'd say then is one earlier-than-Rebol-Open-Source-Contributos (c) per file, and if it just says Saphirion or Atronix but not really anything about REBOL technologies then give it to them (for instance FFI to Atronix). Then after whatever the last date that was have the line for Rebol Open Source Contributors. (There's a couple of superfluous periods on those lines I noticed.)

Also: I think the Rebol code portions should be indented like the headers are... so two spaces in. And lose any horizontal spacing after that. So:

//  Property: "Value"
//  Property-Long-Name: "Other Value"

...as opposed to:

//  Property:           "Value"
//  Property-Long-Name: "Other Value"

We should also put the format out for general review when it's a little closer for earl and Shixin, and see if they'd like to chime in...

Imagine how great it will be when this is over! :-)

codebybrett commented 8 years ago

... Robert said it was cool with him if we had a unified credits list...

Great!

... CREDITS.md up as a wiki and let people edit ...

Good idea. I plan to delete https://github.com/codebybrett/temporary.201512-file-headers, so that could be useful for a transitory wiki but not if history needs to be kept.

Plan of action...

This was hard to understand. However I think what you're saying is what I've already described and done, except that you want the start date of the open source contributors line (the second copyright notice) to be dependent upon the first copyright notice (instead of the constant 2012 that I currently have). I have made that change. So we have:

There's a couple of superfluous periods on those lines I noticed.

I have removed the period from the second copyright notice.

... the Rebol code portions ...

The metadata section is not Rebol syntax at the moment it's a custom text format that can support multiple paragraph text values. The code was needed to parse the original headers, so I've put it in as the metadata format for the now. It does have the benefit of minimising the editing burden if nearly all values are strings. Making the metadata rebol syntax would simplify parsing obviously at the expense of an increased editing burden. This format does not preclude the usage of rebol syntax so we can leave this until it is clearer how the metadata values might be used and what their types might be.

Many files have an Author which is usually Carl Sassenrath but not always. Some people added their name to this field and in a couple of instances it contains extra information. The oddities:

"Carl Sassenrath (REBOL interface sections)" "%s-crc.c"
"Carl Sassenrath, Joshua Shireman" "%dev-serial.c"
"Carl Sassenrath, Ladislav Mecir" "%f-math.c"
"Carl Sassenrath, Ladislav Mecir" "%reb-c.h"
"Carl Sassenrath, Ladislav Mecir, HostileFork" "%m-gc.c"
"Carl Sassenrath, Richard Smolak" "%host-lib.c"
"Carl Sassenrath, Richard Smolak, Shixin Zeng" "%host-process.c"
"Carl Sassenrath, Shixin Zeng" "%c-function.c"
"Ladislav Mecir" "%reb-dtoa.h"
"Ladislav Mecir for REBOL Technologies" "%f-deci.c"
"Richard Smolak" "%host-core.c"
"Shixin Zeng" "%dev-signal.c"
"Shixin Zeng" "%t-library.c"
"Shixin Zeng" "%f-int.c"
"Shixin Zeng" "%reb-struct.h"
"Shixin Zeng" "%t-strut.c"
"Shixin Zeng" "%t-routine.c"
"Shixin Zeng" "%p-signal.c"

Clearly removing Author from the files is desirable, but I'm not sure that I'd want to lose the information on what files Carl put his name to. Perhaps that information can go into Credits or Readme or something, or maybe rename it to be "Created by" and stick it in the notes For other peope perhaps this list can remind them for their Credits entry.

hostilefork commented 8 years ago

Clearly removing Author from the files is desirable, but I'm not sure that I'd want to lose the information on what files Carl put his name to. Perhaps that information can go into Credits or Readme or something, or maybe rename it to be "Created by" and stick it in the notes For other peope perhaps this list can remind them for their Credits entry.

Sure, why not put that as the person's line in the credits entry for starters. If that's what people want on their blurb that's what it can be. If not, they can take it out.

I think that if historians wonder which file exactly Carl thought worth putting his name on, they can look at R3-Alpha... he is pretty well covered by REBOL Technologies. :-)

I think what you're saying is what I've already described and done, except that you want the start date of the open source contributors line (the second copyright notice) to be dependent upon the first copyright notice (instead of the constant 2012 that I currently have).

You seem to have it pretty much down. And I think Robert's agreement is more like "not a problem" vs. "highly concerned in making an issue about it". (Being acknowledged in the credits and being weblinked these days is more useful than a copyright designation on an Apache-licensed file with lots of contributions online, which is part of the whole point of this simplification to get rid of the cruft and not let more cruft in...)

This format does not preclude the usage of rebol syntax so we can leave this until it is clearer how the metadata values might be used and what their types might be.

Ok, no problem. It's more a thought as I didn't have a real concrete idea for what we'd do with it yet. Better to go ahead and get things into a new format so that the files can start getting proper notes and descriptions... that's the real holdup.

codebybrett commented 8 years ago

Ok cool.

I have removed Author from the sources to be bulk converted. and updated the draft credits file to include the association of these removed author names to the source files (except for Carl).

So I guess it's pretty much ready for discussion. I'll leave that with you.

Atronix and Saphirion and contributors will need to check the draft source files to see that they are happy.

In particular, Atronix needs to confirm that it is only their copyright (not Rebol Tech's) on these two files (because I removed the RT copyright notice on the basis that it looked like it was Atronix's work):

%src/core/p-signal.c
%src/os/linux/dev-signal.c

I have just noticed the NOTICE file in the root folder :-/ So the Third Party component part of Credits could be replaced with a link to NOTICE (bit of a waste of my time). I haven't compared the list (going to dinner) but I understand from what you said that you'll be reviewing components anyway.

codebybrett commented 8 years ago

Post conversion outstanding items: