cirosantilli / write-free-science-books-to-get-famous-website

MOVING to: https://ourbigbook.com Mission: live in a world where you can learn university-level mathematics, physics, chemistry, biology and engineering for free whenever you want from perfect open source books made for free by random people who want to get famous to get better paying jobs. 使命:生活在一个世界上,您可以随时随地免费获得完全随机的,希望成
https://ourbigbook.com
130 stars 18 forks source link

= Write Free Science Books to Get Famous Website :idprefix: :idseparator: - :sectanchors: :sectlinks: :sectnumlevels: 6 :sectnums: :toc: macro :toclevels: 6 :toc-title:

MOVING to: https://ourbigbook.com

Mission: live in a world where you can learn university-level mathematics, physics, chemistry, biology and engineering for free whenever you want from perfect open source books made for free by random people who want to get famous to get better paying jobs.

toc::[]

== Desired social impact

Crush the current grossly inefficient educational system, replace today's students + teachers + researchers with unified "online content creators / consumers".

Gamify them, and pay the best creators so they can work it full time, until some company hires for more them since they are so provenly good.

Destroy useless exams, the only metrics of society are either:

Reduce the entry barrier to education, like Uber has done for taxis.

== Key algorithms

The key innovation of the website is to use the following algorithms to rank users and posts, while avoiding concept of "elected human moderators" at all costs.

=== PageRank with tags

This is the central and most important algorithm of the website.

The website will look a lot like a hosted blog like https://wordpress.org or link:https://medium.com/[], but with the following additions:

From these inputs, we want to answer, using algorithms, the following questions:

This is the central algorithmic innovation that we want to implement.

If an user has high reputation for a tag, say C++, then:

Just like for PageRank, this leads to circular chains of influence, e.g.:

And then a way to solve this problem is to model it to an Eigenvalue problem.

==== PageRank with tags sketch

We do not know exactly what the algorithm, but we believe that the PageRank analogy is valid. The algorithm could look something like this.

If we forget tags to simplify, we could do a bipartite authors / posts graph:

To consider tags without weight, in addition:

We do not know exactly what the algorithm, but we believe that the PageRank analogy is valid.

=== Newer is better

On Stack Overflow for example:

We must include in our post score and user reputation a time factor, so that recent votes count more than old votes.

It would be even more awesome to have a parameter that controls how much time matters, and then this would allow us to cover a wide variety of post types:

The Reddit ranking algorithm does this reasonably well: https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9

Even better, would be to consider how many times users view EACH post in a single page, with some JS black magic. With that, we can just use the Wilso score interval https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval as mentioned at: https://www.evanmiller.org/how-not-to-sort-by-average-rating.html

SO threads:

Non SO literature:

=== Tag duplicates

How to mark tags java and Java as being duplicates without moderators?

Possible solution: everyone can mark tags as duplicate.

Why people would waste time doing that? Because once you mark tags as duplicate, if you search for one, you will see both, so you can waste less time searching.

Then we need some algorithms that fuzzily joins all subjects that many people said are the same.

This is one of Quora's focus: https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs

=== What is the best revision of a given content?

The website will have GitHub-like pull requests to content.

No one can ever edit your posts unless you explicitly allow them.

This prevents edit wars which can only be resolved with moderation.

But you can make your own copy (fork) according to the required website content license (CC-BY-SA), and a make precise a suggestion, which can be merged with a single click (aka GitHub pull requests).

But then What happens if:

Possible solution:

== Further features

The following less-algorithmic features must also be present.

=== Post trees

It must be possible for users to create trees of posts.

When a teacher wants to create a course for example, he can just link to existing material to the course material tree.

And only if something is missing, then he may write it.

Pull requests can be made for additions to the post tree, just as for regular content.

The best way to do such tree, would be something along:

To do this we will need to find a highly extensible JavaScript WYSIWYG text editor.

https://github.com/JefMari/awesome-wysiwyg

==== Quill

https://github.com/quilljs/quill/

Has out of box:

TODO:

==== TinyMCE

https://github.com/tinymce/tinymce

=== Tags and post bijection

It would be awesome if all tags mapped to posts.

This way, a posts would serve as the description of a tag.

For example, the tag mathematics should map to a set of posts mathematics, which explains what Mathematics is, and contains a tree of children nodes which are sub-subjects, e.g. algebra, calculus, etc.

Furthermore, when an user puts the algebra post as a child of mathematics, this is equivalent to saying "tag my Algebra article with the mathematics tag".

=== Comments

Comments and pull requests are analogous, and stored separately from regular nodes as mathematics.

Comments and pull requests are more like "meta posts, with optional titles".

Comments are like GitHub issues, which are very similar to pull requests.

Comments are tied specifically to a given post.

E.g., if user 1 and user 2 make their own page entitled Mathematics and Algebra, the Algebra page of both users could often be a child of the Mathematics page of either user.

Comments on the other hand, are tied to a single Mathematics page of a single user.

Forks however should inherit all comments and pull requests.

=== Export and local editing

It would be awesome if the website could export and re-import an entire tree as, say, Asciidoctor for the following reasons:

The main question then is what to do about header IDs and links.

After the following features are implemented however:

we can just go for:

.... [[cirosantilli/header-visible-id,data-id=12345678,data-tags=mathematics,physics]] == My header ....

where:

Then for imports:

=== Secondary further features

== Secondary algorithms

These are further algorithms that would also be worth investigating, but which are not the most critical ones in our opinion.

=== Vote ring prevention

This would counter voting fraud, e.g. of close groups of friends which upvote each other a lot.

Malicious users, or innocent users from close-knit research communities, might end up voting each other a lot.

We would like to have an algorithm such that every time you upvote the same given person, it has less positive impact on his reputation for that tag than the previous upvote.

=== Original research vs explanations

How to determine if something is "original research" or not?

E.g.: a genius discovers something and publishes it really badly explained.

Someone less intelligent comes, explains it better, and gets widely read.

Or someone who just posts a bunch of links to good sources.

=== User trusts user

It would be cool for a user to say: I trust this other user on given tags / all tags.

Maybe this is required. E.g., given a real network, a bot network could make an exact copy of it, and that should have the same reputation as the real one.

Such relations make per-user score of other users / posts even more important.

=== Per user score of all other users

Rate how much one user likes other users based on his actions.

E.g.: someone who only upvotes C questions will give score 0 for someone with only Java questions.

=== Tag hierarchy extraction

We could be able to deduce that animal includes dog, is a lot of articles tagged as

== Prototypes

Very early stage:

== datasets

A hard part in testing the algorithms is that it is difficult to obtain data in the first place.

Besides the possibility of bootstrapping data ourselves by <>, these are some existing datasets that could be used:

=== Crossref

https://support.crossref.org/hc/en-us/articles/213126066-Datasets-database-

Likely largest database of DOI metadata. They also issue DOIs.

Data comes from multiple journals, and each one has a different metadata set. Some don't even have cross references, most have authors by name only instead of ORCID.

You have to belong to a journal to be listed there at all.

They host the metadata only.

=== PubMed

Smaller than <> since only for bio related stuff, but despite that does not even seem to be much more uniform anyways...

Download data from: https://www.nlm.nih.gov/databases/download/pubmed_medline.html

TODO how are references encoded? Example.

Most authors don't have ORCID, just string name. ORCIDs are in an optional field.

Most journals don't have keywords, but at least those that do have keywords nicely split in the XML.

On the other hand, has a bunch of more bio specific fields such as which chemicals the paper mentions... lol, they can't standardize the most important data, but they can add stuff like this.

pubmed data represents the central topic of an article through the MajorTopicYN field which is interesting.

== Business model

=== Business model difficulties

== TODO

I have to organize this part better.

:leveloffset: +2

== Research

Software:

StackApps:

General reputation systems:

Concept maps:

Social network:

=== PageRank

Implementations:

Mathematical problem: make a stochastic matrix graph where each entry equals:

Now calculate the steady state of the Markov process: https://en.wikipedia.org/wiki/Markov_chain#Steady-state_analysis_and_limiting_distributions which is the same as calculating the eigenvector.

Convergence of simple interactive algorithm: stochastic link matrix M iff M is both: (TODO proof):

Proposal to use it on Stack Overflow:

PageRank tutorials and papers:

PageRank alternatives:

PageRank variants:

== Websites with tag votes by any user

== Misc websites

Traditional websites with good content model:

No publishing innovation there, but inspirational presentation structure and scope.

=== Get free DOIs

DOIs are identifiers for articles, and what current research uses an identifiers.

https://academia.stackexchange.com/questions/81583/are-there-free-doi-generation-services

link:https://arxiv.org[]: you need to get an endorsement by someone who has a least three published papers on a given magic category. This then gives you free DOIs, which makes your stuff visible by third party rankers like Google scholar. PDF uploads. Meh.

==== Figshare

https://figshare.com 2018

You can upload a bit of description text which change, but the files are unchangeable.

Forces you to select from magic tag / category list.

DOIs of type: https://doi.org/10.6084/m9.figshare.6248786.v1 and those links redirect you to the content

Magic urls have a version for multiple versions of same content, but this is just a convention done by figshare.

TODO: ORCID login?

==== Zenodo

https://zenodo.org/

== Cool people and movements