IHDE-Alliance / solarnet_metadata

SOLARNET metadata for solar observations
Creative Commons Zero v1.0 Universal
4 stars 2 forks source link

Adding all of the text of the original word file for the Solarnet recommendations #5

Closed ehsteve closed 1 week ago

ehsteve commented 4 months ago

This merge request should include all of the original text from the solarnet document with a few small changes. The original text can be found here.

To see a rendered version of the documentation visit readthedocs. The PDF version can be found here.

This pull request should lead to the first official release.

ehsteve commented 4 months ago

@tfredvik yes! that is the point. I labelled the pull request as draft and then added you as reviewer which probably notified you. I assumed that you would have gotten a notification only once I removed the draft label. Sorry about that! Ignore this until it is not a draft and is ready for review. I'll re-add you both as reviewers once it is ready.

ehsteve commented 4 months ago

Oh sorry, actually i had not marked it as draft but now i have!

ehsteve commented 2 months ago

@steinhh and @tfredvik, I think we are ready for your review. All of the content should now be there (thanks to a lot of help from @Alrobbertz!). There are a few open issues that we wanted to check with you on.

  1. Section numbering is now done automatically but this does not play well with existing section numbers including the highest sections Part A, Part B, etc. Appendices also have their own automatic section numbers which differ from the hard-coded appendix numbers which is confusing. How should we proceed?
  2. Original section 19 and 20 are now potentially obsolete because indexing is now done automatically. Do we keep them? Do we keep a separate list of the solarnet keywords still?
  3. Review the keyword descriptions in Section 13. Is there anything else we want to be added to this table?
steinhh commented 2 months ago

Hi @ehsteve! So far we've only glanced at it, but here's Terje's comments:

I don’t understand Steven’s question 2. What automatic indexing is making the keyword lists obsolete? I don’t understand what his 3rd question mean.

--

To which I'll add that it is essential to preserve the original section numbering in such a way that existing references in other documents (papers, metadata standards, etc) are still valid and correct. Not least the companion document regarding simulations, where section numbers have been "synchronised" and quite a few sections are simply "See Section X in the other document".

At worst, sections should be kept even if they end up containing nothing other than "See Section X". If it ends up with a double redirect from the simulation paper, that's just a funny side effect ;-)

ehsteve commented 2 months ago

@steinhh thanks for the feedback!

  1. Fixed
  2. Please check again. I see the section but i may have fixed it without realizing it.
  3. Sorry, but i flew through and updated all of the sections and so now I am not sure which one was Section 13!
  4. Not sure what you mean. See below for providing pointers to where you see individual issues.
  5. Those were probably escape characters introduced by the original docx to markdown translation. Thanks for pointing them out. I think I fixed all them all with a project wide search.
  6. Deleted. Sorry, we left that in by mistake.

I understand your concerns about keeping the original section numbering. I had tried to match it with automatic numbering with sphinx but could not match it exactly. I've now removed that automatic numbering and have manually added all of the section numbers to their original values.

Regarding question 2 and Part C, we wanted to talk to you about the content there. For one, the index is here. It is generated automatically by sphinx and it just looks at the list of keywords in this file. We can easily add more files to add elements to the index (like the other FITS keywords). This seems to be doing exactly what you are doing in Part C. You have two sections here 19 and 20 but from an index perspective, I don't see a reader wanting to have to figure out whether a keyword they want more information about is a solarnet keyword or not. A single index makes more sense to me.

Regarding question 3, the table in Section 19 is something i had suggested to you by email before we met! The idea is to create a quick dictionary description of all keywords so that users can easily look up a keyword to see what it means. This is also generated from the csv file. We can therefore easily add columns to this table. For example, we can add a column that notes whether it is a solarnet keyword or not.

To provide pointers to specific problems, if you click on Files Changed you'll see every single file. You can then click on a line number in a specific file to add a comment at that specific spot.

ehsteve commented 2 months ago

Have either of you had the chance to look at the automatically generated PDF?

tfredvik commented 1 month ago

The pdf looks ok, but there are a few issues when comparing it to the readthedocs:

1) The section numbering is confusing. E.g. the Section called “Grouping" is Section “2.1.7 7”?

2) There’s an error in the section numbering in both the pdf and readthedocs: In our document there are two sections following each other called “7.4 Mosaics” and “8. Pipeline processing applied to the data”. In readthedocs both sections are numbered 7. In the pdf they are numbered “7.4” and “2.1.8 7.4”.

3) Most of the keywords in the pdf are not fixed width, only the keywords in the grey boxes are.

4) I think the header examples should support 80 characters, as readthedocs does. E.g. the first grey box on page 59, Appendix II, should contain four lines only, the three last lines should start with CONTINUE.

5) In readthedocs: Section 2.4 Comments is listed in the index, but it’s not in the document. The section is present in the pdf.

I've earlier mentioned an issue concerning colouring of the header examples. The issue is present in the pdf as well. As an example, turn once again to the first grey box on page 59, Appendix II. The “and”s in the keyword comments are coloured green. The same is the case for the numbers, “type” and “is”s in the keyword comments of the bottom gray box on that page.

On 12 Sep 2024, at 21:13, Steven Christe @.***> wrote:

Have either of you had the chance to look at the automatically generated PDF? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

-- Cheers, Terje

steinhh commented 1 month ago

Regarding indexing/reference, for sure it's a bit overkill (and confusing) to have three sections. Ideally, we would like to have them all combined into one. I.e., an alphabetical list where SOLARNET keywords are highlighted (e.g., red text), with the brief explanations and section references as links. Would take a bit of scripting, yes... but doable? (We think those SOLARNET keywords should stand, in particular for folks who are already familiar with fits files and who want to know "what's new"). I guess the non-essential elements are keyword sections as links, followed by the section numbers themselves, and the coloring. But it would be really useful to have at least the descriptions + links together.

ehsteve commented 1 month ago

Thank you both for the feedback. We will work on it. @Alrobbertz, I won't be able to get to this this week. Do you have time?

Alrobbertz commented 1 month ago

Thank you both for the feedback. We will work on it. @Alrobbertz, I won't be able to get to this this week. Do you have time?

Yes, I can work on this later this week.

ehsteve commented 1 month ago

@Alrobbertz I realized that i can't switch the readthedocs to this repo (instead of mine) because it does not contain readthedocs.yaml file yet (because it is in this pull request!). We will have to switch it over once this PR is merged.

Alrobbertz commented 1 month ago

Regarding Terje's comments:

  1. I Fixed the section numbering in the PDF so neither the PDF nor the HTML use auto-generated section numbers.
  2. This was just a manual section number fix.
  3. This was a more challenging fix. I had to add a custom latex parser since we we're nesting index nodes inside of code nodes and sphinx wasn't handling this well.
  4. Making the code blocks support more text without wrapping is a trade-off. I tried messing around with margins, and changing the font size.
    • I could go down to font size 10pt to keep the 1in margins.
    • I went down to 0.5in margins to keep the 12pt font size. (This is what I ended up keeping)
  5. Not sure what was happening here, but its fixed now...

Regarding a combined index, I created a combined section 19 that merges together the keywords, their origins, descriptions, and hyperlinked section references.

Other things I fixed along the way...

  1. The coordinate calculate flowchart in Appendix VI was displaying properly in the HTML but it was not displaying at all in the PDF. I got it to display in the PDF by changing the image format from .svg to .png. (since LaTeX doesn't handle .svg images well).

Other Issues I'm still working on, or @ehsteve if you have time to look at these

  1. Text in tables is not wrapping properly in the PDF. For example, the second table in Appendix II has text wrapping in the HTML, and looks funky, but the PDF has the text cut off and not wrapping at all.
  2. There are extra blank pages in the PDF that I can't seem to get rid of. I've tried a few things, but nothing has worked so far.
  3. The "References" section appears after the appendics in the original document, and is listed as section 11. However, in our conversion we list the references after Part C. and do not give it a section number. This creates a strange transition where Part A now ends with section 10 and Part B picks up with section 12.
ehsteve commented 1 month ago

This is looking great @Alrobbertz IMHO!

Regarding the SVG figure, we could potentially use svglib to convert the file to a PDF. Latex can generally include those without too much trouble.

For your other issues

  1. Text wrapping in tables is a known problem. There are a few known solutions though I'm not sure if any of them are available to us.
  2. The blank pages may be caused by this.
  3. We need input from @steinhh and @tfredvik on this one please!

Also, could the two of you have a look at the new table with section references? I think it would be good to add more explicit descriptions of each of the keywords. I'm wondering whether either of you are willing to help with this?

tfredvik commented 1 month ago

I agree that it might look a bit strange that Part A ends with Section 10 and Part B starts with Section 12, but in my opinion that’s just a cosmetic glitch that doesn’t really matter. The important thing is that the section numbering of all the other sections are the same as in the Word document. Whether the references section is Section 11 or not is all the same to me.

-- Cheers, Terje

On 1 Oct 2024, at 16:34, Steven Christe @.***> wrote:

This is looking great @Alrobbertz IMHO! Regarding the SVG figure, we could potentially use svglib to convert the file to a PDF. Latex can generally include those without too much trouble. For your other issues • Text wrapping in tables is a known problem. There are a few known solutions though I'm not sure if any of them are available to us. • The blank pages may be caused by this. • We need input from @steinhh and @tfredvik on this one please! Also, could the two of you have a look at the new table with section references? I think it would be good to add more explicit descriptions of each of the keywords. I'm wondering whether either of you are willing to help with this? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

ehsteve commented 1 month ago

@tfredvik if you are okay with the document as it stands please click approve and we can do a first release and fix those cosmetic glitches in a bug fix release. Thanks!

Alrobbertz commented 1 month ago

@ehsteve I fixed the index and blank pages. It looks like the table wrapping is going to be a bigger fix since it requires converting all the markdown tables into latex tables. Is this something you want to do as a part of this release or a future release?

ehsteve commented 1 month ago

@Alrobbertz i just want through the pdf and didn't see anything that bad with the tables so i'm okay with doing tweaking in a future release. We are just waiting for approval from @steinhh.

steinhh commented 1 month ago

Hi guys! First of all, thank you for an amazing job!

I don't mind approving the PR as such, things look quite good! The big (and real) step, though, is making it official by telling everyone on my distribution list about it and changing the doc on our web servers to point at the github/readthedocs/PDF versions. I.e., "to find the latest version, from now on go to https://.../".

The announcement should be accompanied by instructions on how to automatically be alerted about changes - preferably only when we think it would be worth-while, as we've done up until now whilst actually updating the on-line version as we go, which is why we call it the "live" version (most updates lately have been very minor, and often just edits for clarity).

Do we do that by having a release branch which people can follow/watch, while we do updates continuously on a develop branch and/or with minor pull requests to that one. With people actually using the develop branch on a daily basis, the release branch being just a way to get alerts (and to get distinct versions to use in references)?

Do we point people to https://.../.../latest/? I really like the TOC there, and I miss it sorely in the https://.../.../latest/generated/ version. The side panel navigation is not as good by far. The side panel is also not sticky (to the viewport) so it does not add anything compared to the very clear TOC on the https://.../...latest/ page. Any way to carry the TOC over into the https://.../.../latest/generated/ page, or vice versa to put the contents onto the https://.../.../latest/ page?

Then, I have to expose more of my ignorance w.r.t. github:

Where's the meat? Sorry, the "code" ;-). Going to the repo home page (https://github.com/IHDE-Alliance/solarnet_metadata/) it's basically empty, and cloning the repo doesn't give me anything more. I suppose this is b/c all the content is in this pull request? If I want to do tweaks, is maneuvering to specific files in the PR the only way, I can't edit on my laptop? That would only be possible after this PR is completed, am I right?

Speaking of tweaks:

  1. The .../latest/generated/ page and PDF document title is just "Solarnet", it should of course be "Solarnet Recommendations for ..." as on the latest/ page.
  2. The second paragraph says "If you want to find the latest released or past version, see the releases". Mhm... where? I guess it's difficult to fix this before we have a release though ;-)
  3. Upon release/before announcement the readme should no longer point to arXiv and the sdc.uio.no sites, but rather the readthedocs page(s)

I will approve the PR unless you think the above comments warrant otherwise. Then that will be the first release, and we should have someone here play guniea pig w.r.t. new releases/alerts, then announce it to the world with clear instructions :)

steinhh commented 1 month ago

Terje didn't see what my "TOC problem" was (but then on the other hand he had not even noticed the full TOC in the latest/ version either). If you guys don't get what I meant, tell me and I'll try to be more explicit. As mentioned before, this is not a showstopper in any way, just a (strong) wish from my side. I may be a bit brain-damaged from working with the document for so many years, but I'm used to "scroll to the top for a complete, clear TOC". Now it's "scroll to the top for a less readable nav bar". I realise that the browser's back button solves the problem to some extent, but if you've clicked more than once on the nav bar, you'll need more than one back button press.

Alternative things that would help would be

  1. Indentation of the section titles in the nav bar, would make it much easier to identify the sections
  2. Expansion of the sections (always) in the nav bar. Improves readability (especially now that there is no indentation for main section titles), but it's also easier to find the right place to go to (the subsection titles provide more context for the main section titles).
  3. Having a sticky nav bar - no scroll to the top needed. Though... well, it's more than one page's worth ain't it... not sure how that would work ;)
Alrobbertz commented 1 week ago

Hi @steinhh,

I think we'd like to merge these changes into the main branch as-is and then we can fix any of the follow-on issues in separate pull requests.

The GitHub release process uses Git Tags to create a unique version number associated with a specific commit on the main branch. This alleviates the need to have two branches for "release" and "development". The latest commits on the main branch can be considered the dev/latest/live version, and we can tag specific commits when we want to generate an official released version. This will eventually populate all the versions on the GitHub repository's release page, although its empty until we create our first release.

This release process will allow users in the web page to view the documentation for each release as well. For example, the Sunpy project uses the same code tools that we are using here, so you can view the Sunpy documentation page as an example. In the bottom right-hand corner there will be a menu that allows you to select a version of the webpage documentation to view. On the sunpy page the default is below: image

Once you click on this menu it will give you the choice of different versions of the documentation to look at on the web page, including released versions (6.0.1, 6.0.2, 6.0.3 as of the time I write this) as well as the latest/live version which is at the head of their main branch. image

Right now the meat and the code only exist in Steven's fork of this repository and in this Pull Request. Once the pull request is merged, then the meat will be in the "/docs/source" folder from the root of the repository. Other files in the "/docs" folder are mainly code used to generate the webpage and PDF versions of the documentation. I'd be happy to go through some examples of how to edit and modify the content with you once we have this PR merged.

Unfortunately the TOC issue might be complicated to solve, and require some more changes to how the code builds the webpage. I'm happy to experiment with this in the future, but I think we'd like to get this first version merged in if possible.

Final Questions:

  1. Can you approve this Pull Request if you agree with path forward and the GitHub release process?
  2. When we do generate a release, what version would you like to use? I see the latest on you server is "2.2-live". Would you like our first release here to be v2.3 or v2.2.1? None of the main content has changed compared to the "live" version, only formatting and our indexing. My recommendation would be to use a v2.3 but I am open to suggestions.
steinhh commented 1 week ago

Hi Steve!

I approved the merge now, but I'm still not clear about the release process - what I'm looking for is a way for end users to get notified when and only when we decide it's time for a new release. Is that possible?

ehsteve commented 2 days ago

@steinhh, if they have a github account they can follow this repo and will automatically get notifications but i assume you want something broader and automatic (besides you sending your own email?). I would actually suggest that you keep sending your own emails to notify people regardless of us adding an automatic notification scheme for that personal touch. People usually start ignoring automated notifications (i know i do!). Let us look into creating something automatic as well. I'll create an issue. Once we create the first release you'll see how it looks on github as well.

Btw, did you answer @Alrobbertz question about the version number you want to use? I recommend you consider Semantic versioning (see https://semver.org). To quote

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backward compatible manner
PATCH version when you make backward compatible bug fixes
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

It is designed for code but people in the community understand what it means. Given that this release has not changed to the actual text, i would consider it a MINOR or PATCH version but given that it is a major change in format and distribution perhaps it deserves a MAJOR version. What do you think?

steinhh commented 2 days ago

On 12 Nov 2024, at 15:26, Steven Christe @.***> wrote:

@steinhhhttps://github.com/steinhh, if they have a github account they can follow this repo and will automatically get notifications but i assume you want something broader and automatic (besides you sending your own email?).

Actually I want something narrower and less automatic, in a sense ;-). If people follow the repo, won't they see everything? I want people to be able to follow a specific branch, then only get notified about commits to that one.

I would actually suggest that you keep sending your own emails to notify people regardless of us adding an automatic notification scheme for that personal touch. People usually start ignoring automated notifications (i know i do!).

Good point about people ignoring automated emails - though it's dependent on how frequent they are. We're not planning on doing official releases often.

That being said, the manual method doesn't take much time. BUT it requires people to get in touch with me to get on the list.

Let us look into creating something automatic as well. I'll create an issue. Once we create the first release you'll see how it looks on github as well.

:thumbs-up: :)

Btw, did you answer @Alrobbertzhttps://github.com/Alrobbertz question about the version number you want to use? I recommend you consider Semantic versioning (see https://semver.orghttps://semver.org/). To quote

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes MINOR version when you add functionality in a backward compatible manner PATCH version when you make backward compatible bug fixes Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

It is designed for code but people in the community understand what it means. Given that this release has not changed to the actual text, i would consider it a MINOR or PATCH version but given that it is a major change in format and distribution perhaps it deserves a MAJOR version. What do you think?

Hmm.. thought I'd responded to @Alrobbertz... anyway, I may have changed my mind since then: we should bump the major version to reflect the "under new management" status, as we did between SOLARNET 1 and SOLARNET 2.

Using MAJOR.MINOR.PATCH... I'm a little bit uncertain here, there may be a conflict caused by having code & document in the same repo. I think people will be quite surprised if the major version was bumped with no significant changes to the document. Not a disaster, but the interpretation of a major version bump as a breaking change will almost never apply to the document (I hope!). I guess I won't mind too much, assuming that a version change does not mean the same as a releases. As in we make one release at e.g., 3.1 but hold on with the next release until version 3.8, because that's where the document changed significantly.

Stein