Documentation & ScalaDoc

derrickoswald commented 5 years ago

This issue tracks the tasks around documentation in general and ScalaDoc in particular.

update manually generated source files (e.g. CIMRelation options, new members of CHIM) to include ScalaDocs for all classes
publish ScalaDocs on github.io somewhere
update CIMTool to generate 'best-effort' ScalaDoc for generated classes
split out CIMReader “interactive usage” as INTERACTIVE.md tutorial using spark-shell or Zeppelin

mbheinen commented 5 years ago

I want to try to tackle this one but need a bit of guidance in some areas.

update manually generated source files (e.g. CIMRelation options, new members of CHIM) to include ScalaDocs for all classes
- CIMRelation documentation seem straightforward, but what new CHIM members are you referring to? Most of the methods and classes in that file seem to have documentation on them already.
publish ScalaDocs on github.io somewhere
- Are you thinking an update to the existing https://derrickoswald.github.io/CIMSpark or something else?
update CIMTool to generate 'best-effort' ScalaDoc for generated classes
- What do you mean by 'best-effort'? Can you give an example?
split out CIMReader “interactive usage” as INTERACTIVE.md tutorial using spark-shell or Zeppelin
- Is this just splitting out the "Sample Interactive Usage" section already in https://github.com/derrickoswald/CIMSpark/blob/master/CIMReader/README.md into it's own markdown page and expanding on the content?

Also, if I can't do it all we may need to open separate issues to track some of them.

derrickoswald commented 5 years ago

This is a good starter topic because someone with new eyes has questions - that could be answered with documentation. ScalaDocs should be useful to new users of the code as well as experienced users. The CHIM class is just one example that was created very early in my Scala odyssey and could use some love.

If you look at the ScalaDoc you'll see most of it is undocumented. You don't have to do everything, but maybe just the things that cause a WTF? reaction.

True, the ScalaDocs are published on github.io - I guess my ToDo list wasn't up to date.

By 'best effort' I'm referring to places where the .eap file contains lists or formatting instructions in addition to text, e.g.

TopologicalIsland has a dashed list that comes out flat text
LoadDynamics has HTML markup

enumeration MeasurementTypeEMS is obviously a list but it comes out flat text

This is obviously bordering on AI to achieve it fully, but a few pattern recognition utilities may make a lot of difference to readability.

Yes, it's mostly just breaking out the existing text and making it current and easy to try on your own. You should try it and fix the places that make you stumble.

For the Zeppelin connection to the sandbox see https://zeppelin.apache.org/docs/latest/interpreter/spark.html which is mostly true, and:

needed to add SPARK_HOME to the Spark interpreter page (form for arbitrary stuff at the bottom)
conf/zeppelin-env.sh:
- export ZEPPELIN_PORT=8980 // to avoid 8080 used by Spark GUI
- export SPARK_HOME=/somewhere/spark/spark-2.4.3-bin-hadoop2.7/
- export SPARK_SUBMIT_OPTIONS=\" --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.1" // to access the spark-cassandra connector
- export HADOOP_HOME=/somewhere/spark/hadoop-2.7.6
- export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

The last two are to allow access to HDFS assuming you have that set up in core-site.xml:

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://sandbox:8020</value>
  </property>
</configuration>

Do what you can and punt the rest to another Issue.

derrickoswald / CIMSpark

Documentation & ScalaDoc #13