DefinitelyTyped / tsd

[DEPRECATED] TypeScript Definition manager for DefinitelyTyped
Apache License 2.0
1.12k stars 135 forks source link

Tooling + proposal to iterate/clean/import tsd and DefinitelyTyped #21

Closed Bartvds closed 11 years ago

Bartvds commented 11 years ago

This is a big one:

I wanted to add a bunch of definitions to tsd but I didn't feel like editing JSON by hand so I wrote some code + RegExp's to do this.

I had to much fun and explored with underscore/async so now I have a beefy dev library of code to loop a DefinitelyTyped checkout and index the defnitions, parse the standard headers etc.

I also have code to compare that to the content of a tsd checkouts /repo_data and see what's missing and such. For headers that somewhat conform the standard it can also output the tsd json files.

But it turns out there's a lot of weird header formats, non-standard naming and casing mistakes between the two repos, and I also see tsd content that doesn't match naming with DefinitelyTyped.

It annoys me and I have the code now so can do a bunch of automated/semi-manual things to expand/clean/update this but I need your opinion on this before I go ahead:

Some options:

  1. I can list DefinitelyTyped files with wonky headers and go in manually and fix all the headers so they conform to standard. The only trick thing is the amount pull/merges and collisions with other people who are editing.
  2. Naming is a mess. There's a lot of stuff in tsd that doesn't use the names as they are in DefinitelyTyped. For example linq in DefinitelyTyped is linqjs in tsd, or waa is Web-Audio-API-Nightly. Cleaning this might be tricky regarding backwards compatibility.
  3. For every definition header that parses but is not in tsd I can generate json and add it. Would be best to do after 1. Complication here is the ugly existing naming in 2.

I already decided that fully automated stuff is not going to work out, but I'm willing to do a semi-manual sweep on these issues:

  1. I want to as soon as somebody is willing to merge the changes quickly.
  2. needs some thinking. Maybe if you still plan the new version 0.4 or .5 we can straighten this out properly. If you're interested I can output lists of non matching files and put some number on what's what.
  3. I can do for what's not listed in tsd at all.
Bartvds commented 11 years ago

I stashed my code here: https://github.com/Bartvds/tsd-deftools

It's utility code so not for general use or behind a nice api, but it can do a lot for semi-automated bulk editing.

Diullei commented 11 years ago

Nice! I'll take a look and return with a feedback. Em 18/05/2013 09:41, "Bart van der Schoor" notifications@github.com escreveu:

I stashed my code here: https://github.com/Bartvds/tsd-deftools

It's utility code so not for general use or behind a nice api, but it can do a lot for semi-automated bulk editing.

— Reply to this email directly or view it on GitHubhttps://github.com/Diullei/tsd/issues/21#issuecomment-18100380 .

Bartvds commented 11 years ago

Oh, the code is under construction still. I'm exposing the functionality so it's friendlier to use then just hacking the scripts.

So it's not yet ready for use as intended yet.

I was more hoping for your thoughts on the issues I mentioned.

Bartvds commented 11 years ago

I've pushed update with the first command exposed. It's the basic compare and a has working readme.

Here's a gist of the output: https://gist.github.com/Bartvds/5605522

The numbers seem to make sense if I hand verify but it's still preliminary data.

Diullei commented 11 years ago

Very cool! Let me help you!

1) We will fix the DefinitelyTyped repository file headers. 2) I can help to normalize the tsd and DefinitelyType lib names, I'll see what can be done to fix it.

I want to automate the addition of new definitions in the tsd repo.

Good job! I'll fork your code and contribute.

Bartvds commented 11 years ago

The parsing and output is progressing nicely. My local version can now recursively resolve <reference's> and create dependencies as well, and I'm playing with the json generator. It's not quite there yet though :)

This morning I realized I can write a reverse lookup using the TSD data! It's easy to loop and solve/link the naming changes automatically. I could even lookup the fields for the declarations I cannot parse and add some reporting on external urls. I could even keep the GUID's across renaming.

If that pans out I might as well just re-generate most (all?) declaration headers. The RegExp's for the headers are pretty tolerant but if I regenerate I could auto-clean that up to stricter spec for all of them :)

Side note: I'm massively expanding/refactoring the code so don't put to much time in the fork yet, as it'll be merge-hell :)

Bartvds commented 11 years ago

If that all works it could also be a good time to (re)evaluate if the header/json format.

For example, maybe the header could use a lengthier Description: row? Most authors seem to like that, and it's messy to have it in the top row next to the name/version.

Also it could be good to allow multiple author entries (in both header as well as the json), since updates can be quite labourous to do and deserve some credit.

That could be interesting to think about and should involve @borisyankov as well.

Bartvds commented 11 years ago

I pushed big update, with 2 new commands in the CLI interface: listParsed shows current parsing capability and recreateAll outputs the tsd json for the parsing result to a folder (out from path's config). It will output invalid/incomplete parsings as well for now (flagged, so I can see what's going on).

Still very much work in progress of course.

Diullei commented 11 years ago

I understand, Description property could be the latest property.

An idea: The authors list can be retrieved from the git. With this we will print all authors of the definition file.

What do you think about it?

Bartvds commented 11 years ago

Nice idea to parse git, I see git modules for node in npm so that should be doable (I also see a github api wrapper, could be useful sometime).

But a difference is that DefinitelyTyped currently uses a name + url as author, while git stores a name + email. I don't think it's cool to extract peoples email-address and put them on-line so accessible without their explicit consent (email is a bit sensitive so I'd favor a safe choice here). Also this would list everyone, even those who did very minor bug fixes.

We could do a hybrid though: have one (or multiple) self proclaimed Author: Name <url> for the initial version and big revisions (nice if companies contribute and have their url listed), and also extract the git user-names and put those comma separated in a new Contributors: name1, name2, name3 row.

I'm not sure though, I'd have to see it first so I think it just spit some code for that later and see how it looks (I have to parse me some git now :)

Bartvds commented 11 years ago

Another field that's missing and could be useful the url to the github of the original javascript library, or something similar like that to allow us to get at the package.json (or bower.json, component.json or all of them) and analyse that for info.

Main use is scraping and comparing the version numbers, but it would open more possibilities as well since it forms a technical link between the definition and the code, as the project url that's currently in the headers is used a lot for human websites.

For this I'm not sure how to handle multi-use libraries, like those that work for both node.js as well as browsers (and now even windows and maybe more). What also crappy is the first library I checked (angular-js) lists 0.0.0 in it's package.json :)

This needs a bit more thought as well.

Bartvds commented 11 years ago

A quick update: I'm still doing work on this but I got side-tracked a bit, mostly on figuring out how to do unit testing (both for this but also in general) and some unrelated things.

Anyway, I was reworking the definition header/file parser to be more flexible (eg: less hard-coding) so it can more easily adapt to new fields or formats.

I'll continue exploring and building, and try to get it properly testable and maybe it'll be useful for something later on. Maybe it finds some use in/after the update of DefinitelyTyped to TypeScript 0.9.0.

Diullei commented 11 years ago

Hi @Bartvds! You have done a good job on tsd-deftools. Would you like to become a collaborator on tsd repo?

There is much to be improved in TSD ;)

Can I add you as a collaborator?

Bartvds commented 11 years ago

Thanks for the offer, that would be awesome! :)

I'd be happy to become a collaborator. I got to admit I'm not very experienced in Open Source-style collaborative editing but I do have plenty of serious production-shop work experience, so it'll work out.

Let me know what would be a practical starting point, and what channel we should use to communicate idea's and such: github issues will work but other projects use a Google Group, I think that's better suited for open discussion (and looks very Pro as well :)

2013/6/10 Diullei Gomes notifications@github.com

Hi @Bartvds https://github.com/Bartvds! You have done a good job on tsd-deftools. Would you like to become a collaborator on tsd repo?

There is much to be improved in TSD ;)

Can I add you as a collaborator?

— Reply to this email directly or view it on GitHubhttps://github.com/Diullei/tsd/issues/21#issuecomment-19214633 .

Diullei commented 11 years ago

Added! Welcome to the team! ;)

Do you use google talk? My google talk account is: diullei@gmail.com

Bartvds commented 11 years ago

Nice! I see [read + write] access.

I'm not a regular Google Talk user but I loaded one on bartvanderschoor@gmail.com, I've send you a message. Realtime chat would be interesting as I'm almost on the other side of the planet from you (Netherlands.. tsd worldwide.. :)

I have a habit of writing wordy chunks of text, maybe I'll keep those on gitgub or something like Google Groups easy for review.

Bartvds commented 11 years ago

Sorry for the text blobs in previous messages; I replied to you from within my Gmail, and now when I come back in normal browser mode I see Github apparently ignored all the line ends (weird).

But I'm looking at tsd now: I started from the develop-0.4.x branch, and I've installed and build it and commandline module seems to work, and I also can rebuild repo data json/js (after I removed the git clone grunt)

Of course I have some startup questions:

Bartvds commented 11 years ago

Also the code looks very nice but please note I use WebStorm and not Visual Studio.

So I don't know if I'm missing a tool? I see you have many files in .gitignore but I assume I could do without those right? Everything seems to work fine though.

Diullei commented 11 years ago

Ok! Give me a time to review these questions.

On Tue, Jun 11, 2013 at 10:10 PM, Bart van der Schoor < notifications@github.com> wrote:

Also the code looks very noce but please note I use WebStorm and not Visual Studio.

So I don't know if I'm missing a tool? I see you have many files in .gitignore but I assume I could do without those right? Everything seems to work fine though.

— Reply to this email directly or view it on GitHubhttps://github.com/Diullei/tsd/issues/21#issuecomment-19302178 .

Diullei Gomes Desenvolvedor .NET My profiles: [image: Facebook]http://www.facebook.com/people/Diullei-Gomes/100000820722894 [image: LinkedIn] http://br.linkedin.com/pub/diullei-gomes/22/a63/34a [image: Twitter] http://twitter.com/#!/diullei https://github.com/Diullei

Diullei commented 11 years ago

I sent you an email to bartvanderschoor@gmail.com.

Bartvds commented 11 years ago

Closing this to start tracking for 0.4 milestone