almond-sh / almond

A Scala kernel for Jupyter
https://almond.sh
BSD 3-Clause "New" or "Revised" License
1.6k stars 239 forks source link

Is there a way to print Spark Dataframes as HTML tables ? #180

Open skattoor opened 6 years ago

skattoor commented 6 years ago

That would be neat. I searched around but didn't find what I was looking for. Any help appreciated !

Aivean commented 6 years ago

You can add your own helper function that does that, like this:

implicit class RichDF(val ds:DataFrame) {
    def showHTML(limit:Int = 20, truncate: Int = 20) = {
        import xml.Utility.escape
        val data = ds.take(limit)
        val header = ds.schema.fieldNames.toSeq        
        val rows: Seq[Seq[String]] = data.map { row =>
          row.toSeq.map { cell =>
            val str = cell match {
              case null => "null"
              case binary: Array[Byte] => binary.map("%02X".format(_)).mkString("[", " ", "]")
              case array: Array[_] => array.mkString("[", ", ", "]")
              case seq: Seq[_] => seq.mkString("[", ", ", "]")
              case _ => cell.toString
            }
            if (truncate > 0 && str.length > truncate) {
              // do not show ellipses for strings shorter than 4 characters.
              if (truncate < 4) str.substring(0, truncate)
              else str.substring(0, truncate - 3) + "..."
            } else {
              str
            }
          }: Seq[String]
        }

        publish.html(s""" <table>
                <tr>
                 ${header.map(h => s"<th>${escape(h)}</th>").mkString}
                </tr>
                ${rows.map { row =>
                  s"<tr>${row.map{c => s"<td>${escape(c)}</td>" }.mkString}</tr>"
                }.mkString}
            </table>
        """)        
    }
}

Result: image

kretekpodnietek commented 6 years ago

I cannot find publish.html() Where does it come from? From which lib?

Aivean commented 6 years ago

@kretekpodnietek , should be available for for jupyter-scala kernel: https://github.com/jupyter-scala/jupyter-scala#displaying-html--images--running-javascript

skattoor commented 6 years ago

@Aivean: Never took the time to thank you for this. This works beautifully and I made good use of it ever since you answered. Thank you very much ! 😃

Aivean commented 5 years ago

@kretekpodnietek , should be available for for jupyter-scala kernel: https://github.com/jupyter-scala/jupyter-scala#displaying-html--images--running-javascript

@Aivean Could u pls share that publish.html() , not able to find it

@venkatnbcu , from what I can tell, API has changed since I posed this snippet. Quick googling shows that publish is now a member of kernel: https://almond.sh/docs/api-jupyter.html#display-data

jaketripp commented 3 years ago

Thank goodness this is possible!! 🙌

I know this was for the Almond kernel, but for anyone else using the Apache Toree kernel, I managed to adapt this and thought I'd share:

import org.apache.spark.sql._

implicit class RichDF(val df: DataFrame) {
    def view(limit:Int = 20, truncate: Int = 20) = {
        import xml.Utility.escape
        val data = df.take(limit)
        val header = df.schema.fieldNames.toSeq        
        val rows: Seq[Seq[String]] = data.map { row =>
          row.toSeq.map { cell =>
            val str = cell match {
              case null => "null"
              case binary: Array[Byte] => binary.map("%02X".format(_)).mkString("[", " ", "]")
              case array: Array[_] => array.mkString("[", ", ", "]")
              case seq: Seq[_] => seq.mkString("[", ", ", "]")
              case _ => cell.toString
            }
            if (truncate > 0 && str.length > truncate) {
              // do not show ellipses for strings shorter than 4 characters.
              if (truncate < 4) str.substring(0, truncate)
              else str.substring(0, truncate - 3) + "..."
            } else {
              str
            }
          }: Seq[String]
        }

        kernel.display.html(s""" <table>
                <tr>
                 ${header.map(h => s"<th>${escape(h)}</th>").mkString}
                </tr>
                ${rows.map { row =>
                  s"<tr>${row.map{c => s"<td>${escape(c)}</td>" }.mkString}</tr>"
                }.mkString}
            </table>
        """)        
    }
}
...

df.view()